Mathematical Economics - Loglinear Publications

An Introduction to
Mathematical
Economics
Part 1
U
Q2
Loglinear
Publishing
Q1
Michael Sampson
Copyright © 2001 Michael Sampson.
Loglinear Publications: http://www.loglinear.com Email: [email protected].
Terms of Use
This document is distributed "AS IS" and with no warranties of any kind, whether express or
implied.
Until November 1, 2001 you are hereby given permission to print one (1) and only one hardcopy
version free of charge from the electronic version of this document (i.e., the pdf file) provided
that:
1. The printed version is for your personal use only.
2. You make no further copies from the hardcopy version. In particular no photocopies,
electronic copies or any other form of reproduction.
3. You agree not to ever sell the hardcopy version to anyone else.
4. You agree that if you ever give the hardcopy version to anyone else that this page, in
particular the Copyright Notice and the Terms of Use are included and the person to
whom the copy is given accepts these Terms of Use.
Until November 1, 2001 you are hereby given permission to make (and if you wish sell) an
unlimited number of copies on paper only from the electronic version (i.e., the pdf file) of this
document or from a printed copy of the electronic version of this document provided that:
1. You agree to pay a royalty of either $3.00 Canadian or $2.00 US per copy to the
author within 60 days of making the copies or to destroy any copies after 60 days for
which you have not paid the royalty of $3.00 Canadian or $2.00 US per copy.
Payment can be made either by cheque or money order and should be sent to the
author at:
Professor Michael Sampson
Department of Economics
Concordia University
1455 de Maisonneuve Blvd W.
Montreal, Quebec
Canada, H3G 1M8
2. If you intend to make five or more copies, or if you can reasonably expect that five
or more copies of the text will be made then you agree to notify the author before
making any copies by Email at: [email protected] or by fax at 514-8484536.
3. You agree to include on each paper copy of this document and at the same page
number as this page on the electronic version of the document: 1) the above
Copyright Notice, 2) the URL: http://www.loglinear.com and the Email address
[email protected]. You may then if you wish remove this Terms of Use
from the paper copies you make.
Contents
Preface
1 The
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1.12
v
Mathematical Method
De…nitions . . . . . . . . . . . . . . . . . . . . .
The Di¤erence Between ‘ = ’ and ‘´ ’ . . . . .
Implication . . . . . . . . . . . . . . . . . . . .
Negation . . . . . . . . . . . . . . . . . . . . . .
Proof by Contradiction . . . . . . . . . . . . . .
Necessary Conditions and Su¢cient Conditions
Necessary and Su¢cient Conditions . . . . . .
‘Or’ and ‘And’ . . . . . . . . . . . . . . . . . .
The Quanti…ers 9 and 8 . . . . . . . . . . . . .
Proof by Counter-Example . . . . . . . . . . .
Proof by Induction . . . . . . . . . . . . . . . .
Functions . . . . . . . . . . . . . . . . . . . . .
1.12.1 Integer Exponents . . . . . . . . . . . .
1.12.2 Polynomials . . . . . . . . . . . . . . . .
1.12.3 Non-integer Exponents . . . . . . . . . .
1.12.4 The Geometric Series . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
3
5
5
6
8
9
10
10
11
13
15
16
19
21
2 Univariate Calculus
2.1 Derivatives . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Slopes . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Derivatives . . . . . . . . . . . . . . . . . . . .
2.1.3 The Use of the Word ‘Marginal’ in Economics .
2.1.4 Elasticities . . . . . . . . . . . . . . . . . . . .
2.1.5 The Constant Elasticity Functional Form . . .
2.1.6 Local and Global Properties . . . . . . . . . . .
2.1.7 The Sum, Product and Quotient Rules . . . . .
2.1.8 The Chain Rule . . . . . . . . . . . . . . . . .
2.1.9 Inverse Functions . . . . . . . . . . . . . . . . .
2.1.10 The Derivative of an Inverse Function . . . . .
2.1.11 The Elasticity of an Inverse Function . . . . . .
2.2 Second Derivatives . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
23
24
27
28
31
32
34
37
39
42
43
45
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
2.3
2.4
2.5
2.6
2.7
2.8
ii
2.2.1 Convexity and Concavity . . . . . . . . . . . . . . . . . .
2.2.2 Economics and ‘Diminishing Marginal ...’ . . . . . . . . .
Maximization and Minimization . . . . . . . . . . . . . . . . . . .
2.3.1 First-Order Conditions . . . . . . . . . . . . . . . . . . . .
2.3.2 Second-Order Conditions . . . . . . . . . . . . . . . . . .
2.3.3 Su¢cient Conditions for a Global Maximum or Minimum
2.3.4 Pro…t Maximization . . . . . . . . . . . . . . . . . . . . .
Econometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Least Squares Estimation . . . . . . . . . . . . . . . . . .
2.4.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . .
Ordinal and Cardinal Properties . . . . . . . . . . . . . . . . . .
2.5.1 Class Grades . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Ordinal and Cardinal Properties of Functions . . . . . . .
2.5.3 Concavity and Convexity are Cardinal Properties . . . . .
2.5.4 Quasi-Concavity and Quasi-Convexity . . . . . . . . . . .
2.5.5 New Su¢cient Conditions for a Global Maximum or Minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Exponential Functions and Logarithms . . . . . . . . . . . . . . .
2.6.1 Exponential Growth and the Rule of 72 . . . . . . . . . .
Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.1 The Error of the Taylor Series Approximation . . . . . . .
2.7.2 The Taylor Series for ex and ln (1 + x) . . . . . . . . . . .
2.7.3 L’Hôpital’s Rule . . . . . . . . . . . . . . . . . . . . . . .
2.7.4 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . .
Technical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.1 Continuity and Di¤erentiability . . . . . . . . . . . . . . .
2.8.2 Corner Solutions . . . . . . . . . . . . . . . . . . . . . . .
2.8.3 Advanced Concavity and Convexity . . . . . . . . . . . .
3 Matrix Algebra
3.1 Matrix Addition and Subtraction . . . . . . . . . . . . . . . .
3.1.1 The Matrix 0 . . . . . . . . . . . . . . . . . . . . . . .
3.2 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . .
3.2.1 The Identity Matrix . . . . . . . . . . . . . . . . . . .
3.3 The Transpose of a Matrix . . . . . . . . . . . . . . . . . . .
3.3.1 Symmetric Matrices . . . . . . . . . . . . . . . . . . .
3.3.2 Proof that AT A is Symmetric . . . . . . . . . . . . . .
3.4 The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . .
3.4.1 Diagonal Matrices . . . . . . . . . . . . . . . . . . . .
3.5 The Determinant of a Matrix . . . . . . . . . . . . . . . . . .
3.5.1 Determinants of Upper and Lower Triangular Matrices
3.5.2 Calculating the Inverse of a Matrix with Determinants
3.6 The Trace of a Matrix . . . . . . . . . . . . . . . . . . . . . .
3.7 Higher Dimensional Spaces . . . . . . . . . . . . . . . . . . .
3.7.1 Vectors as Points in an n Dimensional Space: <n . . .
3.7.2 Length and Distance . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
48
49
49
51
52
55
60
60
64
67
67
68
69
70
71
73
82
84
86
89
90
91
94
94
96
97
101
103
104
105
109
110
111
112
112
116
117
122
124
126
127
127
128
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
129
134
139
142
143
143
146
148
151
152
154
154
155
157
164
166
168
172
174
176
179
179
180
184
4 Multivariate Calculus
4.1 Functions of Many Variables . . . . . . . . . . . . . . . .
4.2 Partial Derivatives . . . . . . . . . . . . . . . . . . . . .
4.2.1 The Gradient . . . . . . . . . . . . . . . . . . .
4.2.2 Interpreting Partial Derivatives . . . . . . . . . .
4.2.3 The Economic Language of Partial Derivatives .
4.2.4 The Use of the Word Marginal . . . . . . . . . .
4.2.5 Elasticities . . . . . . . . . . . . . . . . . . . . .
4.2.6 The Chain Rule . . . . . . . . . . . . . . . . . .
4.2.7 A More General Multivariate Chain Rule . . . .
4.2.8 Homogeneous Functions . . . . . . . . . . . . . .
4.2.9 Homogeneity and the Absence of Money Illusion
4.2.10 Homogeneity and the Nature of Technology . . .
4.3 Second-Order Partial Derivatives . . . . . . . . . . . . .
4.3.1 The Hessian . . . . . . . . . . . . . . . . . . . . .
4.3.2 Concavity and Convexity . . . . . . . . . . . . .
4.3.3 First and Second-Order Taylor Series . . . . . .
4.4 Unconstrained Optimization . . . . . . . . . . . . . . . .
4.4.1 First-Order Conditions . . . . . . . . . . . . . . .
4.4.2 Second-Order Conditions . . . . . . . . . . . . .
4.5 Quasi-Concavity and Quasi-Convexity . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
188
188
189
192
194
196
197
199
200
203
203
206
207
207
209
212
217
218
218
223
227
3.8
3.9
3.10
3.11
3.12
3.7.3 Angle and Orthogonality . . . . . . . . . . .
3.7.4 Linearly Independent Vectors . . . . . . . . .
Solving Systems of Equations . . . . . . . . . . . . .
3.8.1 Cramer’s Rule . . . . . . . . . . . . . . . . .
Eigenvalues and Eigenvectors . . . . . . . . . . . . .
3.9.1 Eigenvalues . . . . . . . . . . . . . . . . . . .
3.9.2 Eigenvectors . . . . . . . . . . . . . . . . . .
3.9.3 The Relationship A = C¤C ¡1 . . . . . . . .
3.9.4 Left and Right-Hand Eigenvectors . . . . . .
3.9.5 Symmetric and Orthogonal Matrices . . . . .
Linear and Quadratic Functions in <n+1 . . . . . . .
3.10.1 Linear Functions . . . . . . . . . . . . . . . .
3.10.2 Quadratics . . . . . . . . . . . . . . . . . . .
3.10.3 Positive and Negative De…nite Matrices . . .
3.10.4 Using Determinants to Check for De…niteness
3.10.5 Using Eigenvalues to Check for De…niteness .
3.10.6 Maximizing and Minimizing Quadratics . . .
Idempotent Matrices . . . . . . . . . . . . . . . . . .
3.11.1 Important Properties of Idempotent Matrices
3.11.2 The Spectral Representation . . . . . . . . .
Positive Matrices . . . . . . . . . . . . . . . . . . . .
3.12.1 The Perron-Frobenius Theorem . . . . . . . .
3.12.2 Markov Chains . . . . . . . . . . . . . . . . .
3.12.3 General Equilibrium and Matrix Algebra . .
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
4.5.1 Ordinal and Cardinal Properties . . . . . . . . . . . . . .
4.5.2 Su¢cient Conditions for a Global Maximum or Minimum
4.5.3 Indi¤erence Curves and Quasi-Concavity . . . . . . . . .
4.6 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . .
4.6.1 The Lagrangian . . . . . . . . . . . . . . . . . . . . . . .
4.6.2 First-Order Conditions . . . . . . . . . . . . . . . . . . .
4.6.3 Second-Order Conditions . . . . . . . . . . . . . . . . . .
4.6.4 Su¢cient Conditions for a Global Maximum or Minimum
4.7 Econometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . .
4.7.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . .
iv
227
229
231
236
236
240
246
251
258
258
260
Preface
I would like to thank my students for struggling through earlier versions of this
text. In particular I would like to thank Maxime Comeau, Bulent Yurtsever,
Patricia Carvajal, Alain Lumbroso and Saif Al-Haroun for pointing out errors
and typos.
Here are ‘some points of view’ on economics and mathematics:
It is clear that Economics, if it is to be a science at all, must be
a mathematical science. -William Jevons ( Jevons was one of the
early mathematical economists).
There can be no question, however, that prolonged commitment to
mathematical exercises in economics can be damaging. It leads to
the atrophy of judgement and intuition. -John Kenneth Galbraith
(Galbraith is a famous Canadian economist; an advisor to President
Kennedy in the 1960’s; author of many popular books of which our
former prime minister Trudeau was a big fan. Gets no respect from
academic economists.)
The age of chivalry is gone. That of sophisters, economists and
calculators has succeeded. -Edmund Burke.
I advise my students to listen carefully the moment they decide to
take no more mathematics courses. They might be able to hear the
sound of closing doors. -James Caballero.
The e¤ort of the economist is to ‘see,’ to picture the interplay of
economic elements. The more clearly cut these elements appear in
his vision, the better; the more elements he can grasp and hold in
his mind at once, the better. The economic world is a misty region.
The …rst explorers used unaided vision. Mathematics is the lantern
by which what before was dimly visible now looms up in …rm, bold
outlines. The old phantasmagoria disappear. We see better. We
also see further. -Irving Fisher (early 20th century US monetary
v
PREFACE
economist, famous for the Fisher equation: nominal interest rate
equals real interest rate plus the rate of in‡ation).
In mathematics you don’t understand things. You just get used to
them. -John von Neuman (One of the great mathematical brains of
the 20th century. Famous in economics for developing game theory
and for the von Neuman growth model)
One of the big misapprehensions about mathematics that we perpetrate in our classrooms is that the teacher always seems to know the
answer to any problem that is discussed. This gives students the idea
that there is a book somewhere with all the right answers to all of the
interesting questions, and that teachers know those answers. And if
one could get hold of the book, one would have everything settled.
That’s so unlike the true nature of mathematics. -Leon Henkin.
Mathematics. - Let us introduce the re…nement and rigor of mathematics into all sciences as far as this is at all possible, not in the
faith that this will lead us to know things but in order to determine
our human relation to things. Mathematics is merely the means for
general and ultimate knowledge of man. -Friedrich Nietzsche ( 19th
century philosopher, an atheist, famous for his claim that “God is
dead”. )
If we have no aptitude or natural taste for geometry, this does not
mean that our faculty for attention will not be developed by wrestling
with a problem or studying a theorem. On the contrary it is almost
an advantage. It does not even matter much whether we succeed in
…nding the solution or understanding the proof, although it is important to try really hard to do so. Never in any case whatever is a
genuine e¤ort of the attention wasted. It always has its e¤ect on the
spiritual plane and in consequence on the lower one of the intelligence, for all spiritual light lightens the mind. If we concentrate our
attention on trying to solve a problem of geometry, and if at the end
of an hour we are no nearer to doing so than at the beginning, we
have nevertheless been making progress each minute of that hour in
another more mysterious dimension. Without our knowing or feeling
it, this apparently barren e¤ort has brought more light into the soul.
The result will one day be discovered in prayer. Moreover, it may
very likely be felt in some department of the intelligence in no way
connected with mathematics. Perhaps he who made the unsuccessful
e¤ort will one day be able to grasp the beauty of a line of Racine more
vividly on account of it. But it is certain that this e¤ort will bear
its fruit in prayer. There is no doubt whatever about that. -Simone
Weil ( 20th century Christian mystic, her brother Andre Weil was
one of the great mathematicians of the 20th century).
vi
Chapter 1
The Mathematical Method
1.1
De…nitions
Mathematics has no symbols for confused ideas. -Anonymous
“When I use a word,” Humpty Dumpty said in a rather a scornful
tone, “it means just what I choose it to mean – neither more nor
less.” “The question is,” said Alice, “whether you can make words
mean di¤erent things.” “The question is,” said Humpty Dumpty,
“which is to be master – that’s all.” -Lewis Carroll, Through the
Looking Glass
In economics we strive for precise thinking, and one of the ways we do this
is by using mathematics. The beginning of this practice is to be clear about
what we are talking about, and for this we need de…nitions.
We begin with some elementary number theory in order to illustrate the
mathematical methods that we will later apply to economic models. Suppose
then we are interested in the properties of odd and even numbers. Now intuitively you may know that 4 is even and 5 is an odd. If however we wish to
prove things about odd and even numbers, then we have to be able to de…ne
what we mean by an odd and an even number.
Consider then proving that the product of an odd and an even number is
always an even number. It is not enough to make a list such as:
4 £ 5 = 20
2£3 = 6
12 £ 37 = 444
etc:
1
CHAPTER 1. THE MATHEMATICAL METHOD
2
and note that 20 , 6 and 444 are even numbers. This is not a proof! Nor would
it be a proof to make the list even longer because there are an in…nite number
of odd and even combinations.
Without de…nitions we have nowhere to begin!
Now one possible de…nition of even and odd numbers would be:
De…nition 1 An integer m is an even number if and only if there exists an
integer n such that:
m = 2 £ n:
De…nition 2 An integer m is an odd integer if and only if there exists an
integer n such that:
m = 2 £ n + 1:
For example according to the de…nition 18 is an even integer because we can
write it as 18 = 2 £ n where n = 9; while 5 is an odd integer because we can
write it as 5 = 2 £ n + 1 where n = 2:
Armed with these de…nitions we can now prove something:
Theorem 3 The product of an odd and an even number is an even number.
Proof. If a is even and b is odd then a = 2m and b = 2n + 1 and:
a £ b = 2m £ (2n + 1)
= 2 £ (m £ (2n + 1))
= 2£r
where r = (m £ (2n + 1)) is an integer. Thus a£ b is an even number.
Notice the power of this kind of reasoning. In a few short lines we have been
able to prove a result that applies to an in…nity of numbers! This in…nity is a
list of numbers which would go past the moon or even past the most distant
star and yet we are able to say something quite de…nite about it. This is the
magic of mathematics!
1.2
The Di¤erence Between ‘ = ’ and ‘´ ’
One day in microeconomics, the professor was writing up the typical
“underlying assumptions” in preparation to explain a new model. I
turned to my friend and asked, “What would Economics be without
assumptions?” He thought for a moment, then replied, “Accounting.”
CHAPTER 1. THE MATHEMATICAL METHOD
3
Sometimes things are equal to each other simply by de…nition. For example
if
A =
B =
“the number of bachelors in Montreal”
“the number of unmarried men in Montreal”
then A and B are equal to each other by de…nition. There is nothing to prove
here and it says nothing about the world or Montreal.
To emphasize the nature of this kind of equality we use a special kind of
equal sign: ‘´’ so that for bachelors and unmarried men in Montreal we write:
A ´ B:
This says then that A and B are equal by de…nition or that this is an accounting
identity.
When you see this equality sign you can relax! There is nothing to prove,
these things are merely di¤erent notations that mean the same thing.
In economics a good example of an accounting identity is the GN P identity
you learn in macroeconomics:
Y ´C +I +G+X ¡M
where Y is GN P; C is consumption, I is investment, G is government expenditure, X is exports and M is imports.
On the other hand, sometimes things are equal in a more important way.
For example E = mc2 expresses an important fact in physics while f (x) = x2
and f 0 (x) = 2x give us real information about the function f (x) : In these cases
we use = as a way of emphasizing that real information is being provided.
1.3
Implication
In mathematical economics we begin with assumptions and from there attempt
to deduce true implications of these assumptions. Fundamental to this kind of
reasoning is the idea of logical implication; that if A is true then it follows that
B must also be true. We write this formally as:
A =) B;
which is to say that A implies B:1
Example 1: If
A =
B =
1 Sometimes
“Mr. Smith lives in Montreal”
“Mr. Smith lives in the province of Quebec”
you will see the notation: A ¾ B instead.
CHAPTER 1. THE MATHEMATICAL METHOD
4
then since the city of Montreal is in the province of Quebec it follows that
A =) B:
We are often will often be attempting to construct proofs of statements like:
A =) B: Often the link between A and B is not obvious and we need to …nd a
series of intermediate implications so that a proof often takes the form:
A =) S1 =) S2 =) ¢ ¢ ¢ =) Sn =) B
from which we conclude then that: A =) B: Thus the general strategy in
proving A =) B is to begin with A and to use a series of correct implications
to …nally obtain the statement B:
Example 2: Suppose that:
A = “a is odd and b is odd”
B = “a + b is an even number”
and we wish to prove that:
A =) B;
that is:
Theorem 4 The sum of two odd numbers is even.
Proof. Given A it follows that a and b are odd so that a = 2r + 1 and
b = 2s + 1 for some integers r and s so that:
A =)
=)
=)
=)
=)
a = 2r + 1; b = 2s + 1
a + b = (2r + 1) + (2s + 1)
a + b = 2 (r + s + 1)
a + b = 2t where t = r + s + 1
‘a + b is an even number’ = B.
Note that there is a direction to the arrow: =) : This is to convey the idea
that the truth of statement A is communicated to the truth of the statement B
but it is not necessarily the case that the truth of B implies the truth of A: It
is incorrect to conclude from A =) B that B =) A:
Example 1: If B is true so that Mr. Smith lives in the province of Quebec we
cannot conclude that A is true, that he lives in the city of Montreal. He may
for example live in another city in Quebec, say Sherbrooke. Thus A =) B is
true while B =) A is false.
Example 2: If B is “the sum a + b is an even number”, we cannot conclude: A;
that is “a and b are each odd numbers.” For example if a = 4 and b = 6 then
B is true since 4 + 6 = 10 but A is not true since neither a nor b is odd. Thus
A =) B is true while B =) A is false.
CHAPTER 1. THE MATHEMATICAL METHOD
1.4
5
Negation
Let » A denote the negation of A; that is “not” A or A is ‘not true’ or A is
‘false’. For example if A is the statement: “Mr. Smith lives in Montreal” then
» A is the statement: “Mr. Smith does not live in Montreal”. The negation
sign acts like a negative sign in arithmetic since:
» (» A) = A
If we have shown: A =) B we have seen that we cannot conclude from
this that B =) A: However we can correctly conclude from A =) B that:
~B =) ~A or:
A =) B then » B =)» A:
Example 1: In the Montreal/Quebec example we can correctly conclude from
A =) B that » B =)» A: If » B; Mr. Smith does not live in Quebec, then
» A follows, he does not live in Montreal.
Example 2: In the arithmetic example we can correctly conclude from » B;
that “a + b is not an even number” then » A; that it is not the case that both
a and b are odd ”.
1.5
Proof by Contradiction
Reductio ad absurdum, which Euclid loved so much, is one of a mathematician’s …nest weapons. It is a far …ner gambit than any chess
play: a chess player may o¤er the sacri…ce of a pawn or even a piece,
but a mathematician o¤ers the game. -G. H. Hardy.
Proof by contradiction or ‘Reductio ad absurdum’ involves proving a statement A by assuming the opposite » A and deriving a contradiction. Thus
if:
» A =) B and » A =)» B
then » A must be false and hence A must be true.
Example:
He is unworthy of the name of man who is ignorant of the fact that
the diagonal of a square is incommensurable with its side. -Plato
Consider proving that:
CHAPTER 1. THE MATHEMATICAL METHOD
6
p
2 is irrational; that is there are no integers a and b such that:
p
a
2= :
b
p
Proof. Let us assume, to the contrary, that 2 is rational so that there
exists an a and b such that:
p
a
2=
b
where a and b are integers. We can furthermore assume, without loss of generality, that a and b are not both even since if a = 2r and b = 2s then:
p
2r
r
a
= :
2= =
b
2s
s
Theorem 5
For example if it were the case that a = 8 and b = 6 then since
instead use a = 4 and b = 3 instead. Now we have:
p
a
=) a2 = 2b2
2 =
b
=) a2 is an even number
=) a is an even number
8
6
=
4
3
we could
since if a were odd then a2 would also be odd (you might want to prove this).
Therefore we can write a as:
a = 2n
where n is some integer. Using this in a2 = 2b2 we have:
2b2
= a2 = (2n)2 = 4n2
=) b2 = 2n2
=) b is an even number.
Thus both a and b are even numbers which contradicts the requirementpthat
a and b cannot both be even.p Therefore the original assumption that 2 is
rational must be false so that 2 is irrational. QED.
Remark: You will often see the letters ‘QED’ put at the end of a proof. These
letters stand for the Latin phrase: “Quod erat demonstandum” which means:
“That which was to be demonstrated”. This just means that the proof is …nished
so you should not be looking for further arguments. We use the symbol: ¥ to
indicate that a proof is …nished.
1.6
Necessary Conditions and Su¢cient Conditions
In mathematics you will often hear of “necessary conditions” and “su¢cient
conditions”. For example a necessary condition for Mr. Smith to live in Montreal is that he live in the province of Quebec, while a su¢cient condition for Mr.
CHAPTER 1. THE MATHEMATICAL METHOD
7
Smith to live in the province of Quebec is that he live in Montreal. However,
living in Montreal is not a necessary condition for living in Quebec and living
in Quebec not a su¢cient condition for living in Montreal. We have:
De…nition 6 If
» B =)» A
or equivalently A =) B; then B is a necessary condition for A:
De…nition 7 Su¢cient Conditions: If
A =) B
or equivalently » B =)» A; then A is a su¢cient condition for B;
Remark 1: Thus a necessary condition for A is one which necessarily must be
satis…ed if A is to be true while a su¢cient condition for B is satis…ed then this
guarantees the truth of B:
Remark 2: Thus if we have proven a statement of the form
A =) B
then it follows that A is a su¢cient condition for B and B is a necessary
condition for A:
Remark 3: It is important to realize that if A is su¢cient for B it does not
follow that A is necessary for B: Similarly if B is necessary for A it does not
follow that B is su¢cient for A:
Example: We proved that:
“a is odd and b is odd ” =) “ a + b is even ”.
Thus a su¢cient condition that the sum: a + b be an even number is that both
a and b both be odd numbers while a necessary condition for both a and b to
be odd is that a + b be even.
However a + b being an even number is not a su¢cient condition for both a
and b to odd numbers since a + b can be even without a and b both being odd;
for example, if a = 2 and b = 4 then a + b = 6.
Nor is a and b both being odd a necessary condition for a + b to be even; for
example if a = 2 and b = 4 then a + b = 6.
CHAPTER 1. THE MATHEMATICAL METHOD
1.7
8
Necessary and Su¢cient Conditions
Sometimes it is possible to prove both A =) B and B =) A: In this case A
is a necessary and su¢cient condition for B and B is a necessary and su¢cient
condition for A since it then is true that:
A =) B and » A =)» B
B =) A and » B =)» A:
We therefore have:
De…nition 8 If
A =) B and B =) A
then A is a necessary and su¢cient condition for B and we write
A , B:
Notice that with A , B the arrow points in both directions. This is to
indicate that the truth of A communicated to B just as the truth of B is
communicated to A:
If you can prove A , B then you have a much stronger statement than
either A =) B or B =) A alone.
Example: Consider
A = “The integers a and b are odd numbers”
B = “The product of the integers: a £ b is an odd number”
Theorem 9 We have: a and b are odd numbers if and only if a £ b is an odd
number so that: A , B:
Proof. Suppose a and b are odd so that: a = 2m + 1 and b = 2n + 1: It
follows that:
a £ b = (2m + 1) £ (2n + 1)
= 4mn + 2m + 2n + 1
= 2 (2mn + m + n) + 1
so that a £ b is odd. Now suppose that B is true so that:
a £ b = 2m + 1
and consider a proof by contradiction to show that A is true. Thus suppose A
is false, so that one of a and b is even. Without loss of generality suppose a is
even so that a = 2n and hence:
2n £ b
=
2m + 1
1
=) n £ b = m + :
2
Now n £ b is an integer but m + 12 is not an integer, which is a contradiction.
It follows then that A is true.
CHAPTER 1. THE MATHEMATICAL METHOD
1.8
9
‘Or’ and ‘And’
“You are sad,” the Knight said in an anxious tone: “let me sing you
a song to comfort you.” “Is it very long?” Alice asked, for she had
heard a good deal of poetry that day. “It’s long,” said the Knight,
“but it’s very, very beautiful. Everybody that hears me sing it – either
it brings the tears into their eyes, or else ” “Or else what?” said
Alice, for the Knight had made a sudden pause. “Or else it doesn’t,
you know.”
-Lewis Carroll, Through the Looking Glass
In life and in mathematics we often connect di¤erent statements using the
word “or” which is denoted by _.
Example: If A is the statement “n is odd” and B is the statement “n > 10”
then: A _ B means that either n is odd or n is greater than 10:
If n = 13 then A _ B would be true since n satis…es both A and B; it is odd
and greater than 10: If n = 7 then A _ B would also be true since n satis…es A
and we do not need to satisfy B: Similarly if n = 22 then A _ B would be true
because n satis…es B and we then do not need to satisfy B: The only way A _ B
can be false is if both A and B are false. Thus if n = 8 then A _ B would be
false.
Remark: The use of “or” here is the “inclusive or” which is di¤erent from
the “exclusive or” that your mother used when she said: “You can either have
cake or pie”, which meant you can have cake, you can have pie, but you cannot
have both cake and pie. If your mother used the inclusive or then you could
also have cake and pie.
Here are some results involving the connector _ :
Theorem 10 The Law of the Excluded Middle: For any statement A: the
statement:
A_ » A
is true.
Theorem 11 For any statements A and B :
(A =) B) , (» A _ B) :
Another important connector of statements is “and” which is denoted by:
^: Thus in the previous example A ^ B means n is odd and n is greater than
10: For the statement A ^ B to be true it must be then that both A and B are
true.
CHAPTER 1. THE MATHEMATICAL METHOD
10
Negating statements involving _ is equivalent to negating each individual
statement and changing the _ to ^: Similarly negating statements involving ^
is equivalent to negating each individual statement and changing the _ to ^:
Thus:
Theorem 12 For any statements A and B
» (A _ B) ,» A^ » B
» (A ^ B) ,» A_ » B:
1.9
The Quanti…ers 9 and 8
Sometimes in mathematics we use the quanti…er 9 to say that something exists:
For example to express the idea that the integer a is odd we might write:
9nj (a = 2n + 1)
which says that there exists an integer n such that a = 2n + 1:
Other times we wish to make a universal statement that all members of some
class have a property using the quanti…er: 8: For example we might write:
8nj (n > n ¡ 1)
which says that all integers n are greater than n ¡ 1:
In intermediate mathematics and economics the symbols 9 and 8 are sometimes used as a convenient short-hand but are not that important. They do get
used a lot in advanced mathematics and economics.
1.10
Proof by Counter-Example
In mathematics we are often led by our intuition to believe without a proof that
something is always true. We therefore form a guess or a conjecture that this
statement is always true.
Example: In the seventeenth century the French mathematician Fermat conjectured that if n is an integer then all numbers of the form:
n
22 + 1
are prime numbers. Thus with n = 0; 1; 2; 3; 4 we have:
0
1
2
3
4
22 + 1 = 3; 22 + 1 = 5; 22 + 1 = 17; 22 + 1 = 257; 22 + 1 = 65537
and it is a fact that 3; 5; 17; 257 and 65537 are all prime numbers. We have:
(An integer n is a prime number if its only divisors are 1 and n: Thus 5
is prime because only 1 and 5 divides evenly into 5 while 9 is not prime since
9
3 = 3.)
Since we do not know if a conjecture is true or false, there are two strategies
for dealing with a conjecture:
CHAPTER 1. THE MATHEMATICAL METHOD
11
1. Prove the conjecture is true.
2. Use a proof by counter-example to …nd one case where the conjecture is
false.
The …rst strategy is generally the most di¢cult since we have to prove that
something holds for an in…nite number of cases. For Fermat’s conjecture it would
very likely be a deep and di¢cult proof that would show that all numbers of
n
the form 22 + 1 are prime. Of course if the conjecture really is true this is the
only strategy that will lead to success.
Often however you will …nd that no matter how hard you try you cannot
prove a conjecture. In this case you might try the second strategy and search
for a counter-example. If you are lucky this can be much easier since unlike the
…rst strategy, you only need one counter-example to prove the conjecture false.
Fermat died without being able to prove his conjecture. Later Euler was
able to show that Fermat’s conjecture is in fact false since for n = 5:
5
22 + 1 = 4294967297 = 641 £ 6700417
5
and so 22 + 1 is not prime.
1.11
Proof by Induction
Just the place for a Snark! I have said it twice:
That alone should encourage the crew.
Just the place for a Snark! I have said it thrice:
What I tell you three times is true.
-Lewis Caroll, The Hunting of the Snark
Often students attempt to prove results by simply listing the …rst few cases,
verifying that the statement is true, and then by putting a ‘: : : ’ or an ‘etc.’
afterwards. For example suppose you wished to prove the following conjecture:
Conjecture: The sum of the …rst n integers is:
1 + 2 + 3 +¢¢¢+n =
n (n + 1)
:
2
Thus one might write:
1 (1 + 1)
=1
2
2 (2 + 1)
=3
1+2 =
2
3 (3 + 1)
=6
1+2+3 =
2
etc:
1 =
CHAPTER 1. THE MATHEMATICAL METHOD
12
and concludes that the statement is true. As we should now know this is incorrect since it does not exclude the possibility that the conjecture is false for
10
n = 4 or at some really huge number like: n = 1010 :
A correct method to proof these kind of conjectures is proof by induction. A
proof by induction proceeds as follows. We are given a sequence of statements
S1 ; S2 ; : : : and we want to prove that each Si is true. For example we wish to
prove that Sn is true where Sn is the statement:
Sn = \1 + 2 + 3 + ¢ ¢ ¢ + n =
n (n + 1)
":
2
Proof by induction proceeds in two steps:
Proof by Induction
1. Prove that S1 is true. This is usually trivial involving nothing more than
a mere calculation.
2. Assume that S1 ; S2 : : : Sn¡1 are true (this is called the induction hypothesis), and use this to prove that Sn is true.
A proof by induction is very much like setting up an in…nite row of dominos.
To get every domino to fall over two things are needed. First, one must tip over
the …rst domino to get the chain reaction started. This corresponds to the …rst
step in the proof by induction. Next, the dominos must be spaced so that if
one domino falls then its next neighbour must also fall. This corresponds to the
second part of the proof. Together they imply that all of the dominos will fall
down.
Example: Consider proving:
1 + 2 + 3 +¢¢¢+n =
n (n + 1)
:
2
Proof. The …rst step is to verify that it is true for n = 1; which is easy
since:
1=
1 (1 + 1)
:
2
Now assume the induction hypothesis, that the statement is true up to n ¡ 1 so
that in particular:
1 + 2 + 3 +¢¢¢+n ¡1 =
(n ¡ 1) ((n ¡ 1) + 1)
n (n ¡ 1)
=
:
2
2
CHAPTER 1. THE MATHEMATICAL METHOD
13
Now we need to prove the statement is true for Sn : We have:
|1 + 2 + 3 +{z¢ ¢ ¢ + n ¡ 1} + n =
= n(n¡1)
2
1.12
n (n ¡ 1)
+n
2
µ
¶
(n ¡ 1)
= n
+1
2
µ
¶
(n ¡ 1) + 2
= n
2
n (n + 1)
:
=
2
Functions
The basic mathematical object that we will be working with is a function,
de…ned as:
De…nition 13 Function: A function y = f (x) is a rule which assigns a
unique number y to every allowed value of x :
The key requirement here is that there be a unique y for every x: For
example the function:
y = f (x) = x2
assigns to the value x = 2 the unique value y = 22 = 4:
An example of something which is not a function is:
p
y = f (x) = x
since to x = 4 it assigns
p two values: y = 2 and y = ¡2 while to x = ¡4 it
assigns no value since ¡4 is not de…ned. Similarly without any restrictions on
x: f (x) = x1 is not a function since f (0) = 10 is not de…ned.
Implicit in any de…nition of a function is its domain and range:
De…nition 14 Domain: The domain of a function: y = f (x) is the set of
all values of x for which f (x) is de…ned.
De…nition 15 Range: The range of a function y = f (x) is the set of possible
y values over the domain of the function.
Often we can insure that f (x) is a function by restricting its domain and
range.
Example: The problem with
f (x) =
p
x
CHAPTER 1. THE MATHEMATICAL METHOD
14
can be …xed by: 1) restricting the domain to be x ¸ 0 and 2) restricting the
p
as the positive square root
range to be y ¸ 0 or in other words interpreting
p
(e.g. 4 = 2 and not ¡2 ). With these restrictions we have a perfectly good
function as can be seen by the plot below:
6
5
4
3
2
1
0
5
10
15
x
20
25
30
35
:
p
y = x for x ¸ 0
This is actually an example of a Cobb-Douglas production function, one of the
workhorses of economic theory.
Similarly the problem with f (x) = x1 can be …xed by restricting the domain
to be x > 0 in which case we have:
10
8
6
4
2
1
2
x
f (x) =
3
4
5
:
1
x
Remark: Quite often we de…ne the range and domain in a way that ensures
that the function makes economic sense. If for example
Q = f (P )
is a demand function with P the price and Q the quantity demanded, the domain
of f (P ) will be P ¸ 0 and the range Q ¸ 0 since prices and quantities cannot
be negative.
CHAPTER 1. THE MATHEMATICAL METHOD
1.12.1
15
Integer Exponents
An important class of functions take the form:
f (x) = xn
where n is an integer. The meaning of xn for n > 0 is simply x multiplied by
itself n times. For example
x3 = x £ x £ x:
In this case we can allow the domain of f (x) to be all x; that is
¡1 · x · 1:
We can also allow negative integer exponents (i.e., ¡1; ¡2; ¡3 : : : ): By x¡n
we mean x1n : For example:
x¡3 =
1
1
1
1
= £ £ :
x3
x x x
Note that for negative integer exponents we need to exclude x = 0 from the
domain of the function since, 10 = 1 is not de…ned.
Integer exponents obey the following rules, which you might want to prove
on your own:
Theorem 16 If m and n are either positive or negative integers then
1. xm xn = xm+n
n
2. (xm ) = xmn
3. x0 = 1
4. x¡n =
1
xn
n
5. (xy) = xn yn :
Proof. To prove 1 for example we have:
10
0
xm xn
1
A @x £ x £ ¢ ¢ ¢ £ xA
= @|x £ x £
{z¢ ¢ ¢ £ x}
{z
}
|
m
n
= |x £ x £ ¢ ¢ ¢ £ x
{z
}
m+n
= xm+n :
The results 2 and 5 can be proven in a similar manner. To prove x0 = 1 from
1) and 4) we have: from 1) with n = ¡m:
xm x¡m = xm¡m = x0
CHAPTER 1. THE MATHEMATICAL METHOD
while from 4) we have: x¡m =
1
xm
16
and so:
x0 = xm x¡m = xm
1
= 1:
xm
Remark: Note that (x + y)n 6= xn + y n . For example with n = 2
2
(x + y) = x2 + 2xy + y2 6= x2 + y 2 :
1.12.2
Polynomials
An polynomial a weighted sum of xn de…ned as:
De…nition 17 Polynomial: An nth degree polynomial is a function of the
form:
f (x) = an xn + an¡1 xn¡1 + ¢ ¢ ¢ + a1 x + ao
where an 6= 0:
An important property of a polynomial are its roots:
De…nition 18 The roots of a function f (x) are those values r which satisfy
f (r) = 0:
For a polynomial a root r satis…es:
f (r) = an rn + an¡1 rn¡1 + ¢ ¢ ¢ + a1 r + ao = 0:
One of the most important results in mathematics is that a polynomial of
degree n has n (possibly complex) roots. This is important enough that it is
called the Fundamental Theorem of Algebra. It was …rst proved by Gauss.
Theorem 19 Fundamental Theorem of Algebra: An nth degree polynomial
has n roots: r1 ; r2 ; : : : rn ; that is n (possibly complex2 ) solutions to the equation
f (r) = an rn + an¡1 rn¡1 + ¢ ¢ ¢ + a1 r + ao = 0:
Two important special cases are:
De…nition 20 A linear function is a 1st degree polynomial:
y = f (x) = ax + b
De…nition 21 A quadratic is a 2nd degree polynomial:
y = f (x) = ax2 + bx + c:
2A
complex number is of the form a + bi where i =
p
¡1:
CHAPTER 1. THE MATHEMATICAL METHOD
17
Example 1: A 1st degree polynomial: f (x) = ax + b has one root:
r = ¡b=a
as the solution to f (r) = ar + b = 0: Thus
f (x) = 4x + 8
has a single root at r = ¡2 as illustrated below:
20
10
-4
0
-2
2
x
4
-10
:
f (x) = 4x + 8
Example 2: We have
Theorem 22 The quadratic
f (x) = ax2 + bx + c:
has two roots r1 and r2 given by:
p
p
¡b + b2 ¡ 4ac
¡b ¡ b2 ¡ 4ac
:
r1 =
and r2 =
2a
2a
Thus the quadratic:
x2 ¡ 9x + 14
has two roots:
r=
¡ (¡9) §
q
(¡9)2 ¡ 4 (1) (14)
2
or
r1 = 2 and r2 = 7
CHAPTER 1. THE MATHEMATICAL METHOD
18
as can also be seen by the graph below where f (x) crosses the x axis:
80
60
40
20
-4
0
-2
2
4
6
x
8
10
12
:
2
f (x) = x ¡ 9x + 14
An implication of the fundamental theorem of algebra is that a polynomial
can always be factored as follows:
Theorem 23 Let r1 ; r2 ; : : : rn be the n roots of the polynomial
f (x) = an xn + an¡1 xn¡1 + ¢ ¢ ¢ + a1 x + ao :
Then f (x) can be factored as:
f (x) = an (x ¡ r1 ) £ (x ¡ r2 ) £ ¢ ¢ ¢ £ (x ¡ rn ) :
Example 1: The quadratic:
f (x) = 3x2 ¡ 27x + 60:
has two roots r1 = 5 and r2 = 4 so that
f (x) = 3 (x ¡ 5) (x ¡ 4)
which you can verify by multiplying out the second expression.
Example 2: The cubic:
x3 ¡ 19x2 + 104x ¡ 140
CHAPTER 1. THE MATHEMATICAL METHOD
19
has roots at r1 = 2; r2 = 7 and r3 = 10 as can be seen by the graph below:
200
150
100
50
0
2
4
6
x
8
10
12
-50
-100
f (x) = x3 ¡ 19x2 + 104x ¡ 140
or by noting that it can be factored as:
x3 ¡ 19x2 + 104x ¡ 140 = (x ¡ 2) (x ¡ 7) (x ¡ 10) :
1.12.3
Non-integer Exponents
In economics we will often want to consider non-integer exponents that is: f (x) =
xa where a is not an integer.
Example: Two functions with non-integer exponents are
f (x) = x0:3143 and
1
Q = f (L) = L 2
where in the …rst case a = 0:3143 and in the second a = 0:5: The latter case is
an example of a Cobb-Douglas production function.
For non-integer exponent functions y = xa we run into very di¢cult and
deep mathematical waters if we allow either x or y to
p be negative. For example
1
with f (x) = x 2 if we allow x = ¡1pthen f (¡1) = p¡1 is not de…ned while if
x = 4 and we allow y < 0 then y = 4 = 2 and y = 4 = ¡2:
For this reason, whenever we work with y = xa with an exponent a which is
not integers we always assume that x > 0 and that y > 0:
With this quali…cation non-integer exponents obey the same rules as with
integer exponents. Thus:
Theorem 24 If x > 0 and a is any number (integer or non-integer, negative
or positive) then xa is de…ned and:
1. xa > 0
CHAPTER 1. THE MATHEMATICAL METHOD
20
2. xa xb = xa+b
b
3. (xa ) = xab
a
4. (xy) = xa y a
5. x0 = 1
6. x¡a =
1
xa :
Often we will need to …nd the unique positive root of the function:
f (x) = Axb ¡ c
for x > 0 and where b is not an integer. We have:
Theorem 25 The unique positive root f (r) = Arb ¡ c = 0 is given by:
r=
Proof. Since r satis…es:
³ c ´ 1b
A
:
Arb = c
we have:
rb =
c
:
A
To get r by itself we take both sides to the power
¡ b ¢ 1b ³ c ´ 1b
r
=
A
¡ ¢1
1
and since rb b = rb b = r1 = r we have:
r=
³ c ´ 1b
A
:
Example: Given:
f (x) = 10x7:3 ¡ 23
where A = 10; b = 7:3 and c = 23 we …nd that:
r=
µ
23
10
1
¶ 7:3
= 1:120:
1
b
to get:
CHAPTER 1. THE MATHEMATICAL METHOD
21
This can also be seen in the plot below:
150
100
y
50
0
0.2
0.4
0.6
0.8
x
1
1.2
1.4
:
7:3
f (x) = 10x
1.12.4
¡ 23
The Geometric Series
An important result in economics is the geometric series:
Theorem 26 Finite Geometric Series: If x 6= 1 then
1 + x + x2 + x3 + ¢ ¢ ¢ xn¡1 =
1 ¡ xn
:
1¡x
Proof. Let S be given by::
S = 1 + x + x2 + x3 + ¢ ¢ ¢ + xn¡1 :
If we multiply S by x we obtain:
xS = x + x2 + x3 + ¢ ¢ ¢ + xn
and if we subtract these two equations from each other we obtain:
¢
¡
S ¡ xS = (1 ¡ x) S = 1 + x + x2 + x3 + ¢ ¢ ¢ xn¡1 ¡ x + x2 + x3 + ¢ ¢ ¢ + xn
= 1 ¡ xn
so that assuming x 6= 1 and solving for S yields the …nite geometric series:
Now consider letting n ! 1: If ¡1 < x < 1 then xn ! 0 so that:
Theorem 27 In…nite Geometric Series: If ¡1 < x < 1 then
1
= 1 + x + x2 + x3 + ¢ ¢ ¢ :
1¡x
CHAPTER 1. THE MATHEMATICAL METHOD
Example 1: If x =
1
2
22
then:
1
2=
1¡
1
2
1
=1+ +
2
µ ¶2 µ ¶3
1
1
+
+¢¢¢ :
2
2
Thus if you have 2 pies in the fridge and each day you eat 12 of the pie in the
fridge, eating 1 pie the …rst day, 12 of a pie the second, 14 of a pie the third etc.,
you will eventually eat all the pie in the fridge.
Example 2: Suppose you have a bond that pays $a a year forever and the
interest rate is r > 0: The price of the bond is then the present discounted
value:
PB =
a
a
a
+
+
+ ¢¢¢ :
1 + r (1 + r)2 (1 + r)3
We have:
PB
=
=
0
1
x
x2
z
}|
{
z
}|
{
C
a B
1
1
B
C
+
+
¢
¢
¢
B1 +
C
1+r @
1 + r (1 + r)2
A
¢
a ¡
1 + x + x2 + ¢ ¢ ¢
1+r
so the term in brackets is just the geometric series with x =
Then from the geometric series:
PB
=
=
=
=
1
1+r
with 0 < x < 1.
a
1
1+r1¡x
a
1
1
1 + r 1 ¡ 1+r
a 1+r
1+r r
a
:
r
Thus with an interest rate of r = 0:05=year (or 5% per year), a bond that
payed a = $20 per year forever would be worth
PB =
$20=year
= $400:
0:05=year
Chapter 2
Univariate Calculus
2.1
2.1.1
Derivatives
Slopes
You can think of a function f (x) as a system of mountains and valleys with x
denoting your position along the x axis (say how far east or west from some
point) and y your height above the x axis (or how high you are above sea level).
This is illustrated below:
y
x
Mountains and Valleys
An important consideration for both a hiker and an economist then is the
slope. Hikers clearly care if they are going uphill or downhill, and economists
care if a function is upward sloping (as a supply curve) or downward sloping (as
a demand curve).
The slope at any point x can be measured by moving ¢x (say ¢x = 5 or 5
to the right), measuring the change in elevation by ¢y; (say ¢y = ¡20 or 20
¢y
¢y
= ¡20
to get the slope (here ¢x
feet down) and taking the ratio ¢x
5 = ¡4 so
for every foot forward you fall 4 feet with the negative indicating a downward
slope). This leads to the following de…nition:
23
CHAPTER 2. UNIVARIATE CALCULUS
24
De…nition 28 Slope: The slope of f (x) at x for a given change in x : ¢x is
¢y
and is:
denoted by ¢x
f (x + ¢x) ¡ f (x)
¢y
´
:
¢x
¢x
¢y
> 0 the function is upward sloping so that increasing (decreasing) x
If ¢x
¢y
< 0 the function is downward sloping
leads to an increases (decrease) in y: If ¢x
so that increasing (decreasing) x leads to a decrease (increase) in y:
Example: If f (x) = x2 and we want to measure the slope at x = 1 with
¢x = 2 then we obtain:
¢y
¢x
(x + ¢x)2 ¡ x2
¢x
(1 + 2)2 ¡ 12
=
2
= 4:
=
On the other hand if we use ¢x = 0:25 we obtain:
¢y
¢x
(x + ¢x)2 ¡ x2
¢x
(1 + 0:25)2 ¡ 12
=
0:25
= 2: 25
=
while if we use ¢x = 0:001 we obtain:
¢y
¢x
(1 + 0:001)2 ¡ 12
0:001
= 2: 001
=
Note that as we make ¢x smaller the slope appears to be approaching 2:
In general we have:
Theorem 29 For f (x) = x2 the slope is:
¢y
= 2x + ¢x:
¢x
2.1.2
Derivatives
And what are these derivatives? ... They are neither …nite quantities, nor quantities in…nitely small, nor yet nothing. May we not
call them ghosts of departed quantities?-George Berkeley
CHAPTER 2. UNIVARIATE CALCULUS
25
A problem with slopes is that we may get a di¤erent slope depending on
which ¢x we choose. For example with x2 at x = 1 the slope is 2 + ¢x and so
we obtained slopes of 2:25 and 2:001 for ¢x = 0:25 and ¢x = 0:001:
Since we get di¤erent slopes for di¤erent ¢x0 s; one might wonder then
whether there is a best choice for ¢x? This question has a surprising answer.
It turns out that the best choice is to make ¢x zero so that we let:
¢x ! 0:
Remark: Sometimes rather than 0 it is better to follow the earlier inventors of
calculus and imagine that:
¢x ! dx
where dx (note the ¢, delta, is the Greek letter for d ), known as an in…nitesimal,
is a quantity that is as close to zero as possible without actually being 0. As
we make our step ¢x in…nitesimally small, the amount by which we rise or fall
will also get in…nitesimally small so that:
¢y ! dy:
The ratio however of the two or the slope will approach something sensible, the
derivative of the function:
dy
= f 0 (x) :
dx
The use of in…nitesimals was frowned upon by many such as the English
philosopher Berkeley. It was not until over a hundred years after the invention
of calculus that Cauchy was able to provide foundations for calculus that did
not require the use of in…nitesimals. In…nitesimals are nevertheless a real aid to
intuition and especially in applied work they get used all the time.
Example: If y = x2 we have from Theorem 29 that as ¢x ! 0:
¢y
= 2x + ¢x ! 2x + dx = 2x
¢x
where we ignore the dx because it is so small. This gives the well-known result
obtained by multiplying by the exponent and subtracting one from the exponent
that:
d ¡ 2¢
dy
=
x = 2x:
dx
dx
Thus at x = 1 we obtain a slope of
dy
= f 0 (1) = 2:
dx
In general a derivative is de…ned as:
CHAPTER 2. UNIVARIATE CALCULUS
26
De…nition 30 Derivative: The derivative of a function y = f (x) ; denoted
dy
; is the limit of the slope as ¢x ! 0 or:
by f 0 (x) or dx
lim
¢x!0
¢y
f (x + ¢x) ¡ f (x)
= lim
:
¢x!0
¢x
¢x
A graphical depiction of the di¤erence between a slope and a derivative is
given below:
The …rst rule you learn in calculus is how to calculate the derivative of xn
as:
CHAPTER 2. UNIVARIATE CALCULUS
27
Theorem 31 Given f (x) = xn then:
f 0 (x) = nxn¡1 :
Example: Given f (x) = x7 then:
f 0 (x) = 7x7¡1
= 7x6 :
2.1.3
The Use of the Word ‘Marginal’ in Economics
In economics we often use the word marginal; for example the marginal product
of labour, the marginal utility of apples, the marginal propensity to consume
and so on.
The original meaning of ‘marginal’ in say the marginal product of labour
was the e¤ect of adding one more unit of labour L, at the margin, to output
Q. Translating this into mathematics, if we write the production function as
Q = f (L) then the marginal product of labour is:
f (L + 1) ¡ f (L)
¢Q
=
¢L
1
where ¢L = 1 and ¢Q = f (L + 1) ¡ f (L) : Thus the marginal product of
labour is the slope: ¢Q
¢L when ¢L = 1.
In advanced economics we want to have the tools of calculus at our disposal.
For this reason it is much more convenient to use derivatives rather than slopes
to measure the marginal product of labour. Consequently instead of setting
dQ
0
¢L = 1 and using ¢Q
¢L we let ¢L ! 0 and use the derivative dL = f (L).
This re…nement of then notion of marginal extends now to all marginal
concepts so that today:
MPL ´
De…nition 32 In economics when we refer to marginal concepts we mean the
derivative.
Example 1: Given the Cobb-Douglas production function:
1
Q = f (L) = L 2
the marginal product of labour is the derivative of f (L) or:
1
1
M PL (L) ´ f 0 (L) = L¡ 2 :
2
Example 2: Given a utility function for say apples Q:
2
U (Q) = Q 3
the marginal utility of apples is:
2
1
MU (Q) ´ U 0 (Q) = Q¡ 3 :
3
CHAPTER 2. UNIVARIATE CALCULUS
2.1.4
28
Elasticities
Often economists work with elasticities rather than derivatives. The problem
with derivatives is that they depend on the units in which x and y are measured.
Suppose for example we have a demand curve:
Q = 100 ¡ 3P $
where the price P $ is measured in dollars and the derivative is
dQ
dP
= ¡3: If we
c
P/
and:
decide to measure the price instead in cents P c/ we have, using: P $ = 100
Q = 100 ¡
3 /
Pc
100
dQ
3
so now dQ
dP = ¡ 100 : Thus a change in units thus causes the derivative dP to
3
change from ¡3 to ¡ 100 :
Elasticities avoid this problem by working with percentage changes. While
¢y
; the elasticity ´
the slope is the change in y divided by the change in x or ¢x
¢y
is the percentage change in y: y £ 100% divided by the percentage change
in x : ¢x
x £ 100% or:
´´
¢y
y
¢x
x
£ 100%
£ 100%
=
¢y x
:
¢x y
¢y
Notice that here the elasticity ´ is the slope ¢x
multiplied by xy : This is known
as an arc elasticity and is typically used in elementary economics.
In more advanced economics we let ¢x ! 0 and use the derivative in the
dy
¢y
rather than the slope: ¢x
. This leads to the point elasticity or
elasticity: dx
simply the elasticity:
De…nition 33 Elasticity: The elasticity of the function y = f (x) at x denoted
by ´ (x) is:
´ (x) ´
x
dy x
´ f 0 (x)
:
dx y
f (x)
An easy way to remember the formula for the elasticity is to follow the
following recipe:
Elasticity Recipe
1. Write down the derivative as:
dy
:
dx
CHAPTER 2. UNIVARIATE CALCULUS
29
2. Note that with the derivative y is upstairs and x is downstairs. To obtain
the elasticity put the y downstairs and the x upstairs as:
x
y
and multiply this with 1 as:
´=
dy x
£ :
dx y
3. Now to obtain the elasticity as a function of x replace y with f (x) to
obtain:
x
:
f 0 (x)
f (x)
Remark 1: In economics typically x > 0 and y > 0: This means that the
derivative f 0 (x) and the elasticity ´ (x) always have the same sign. Thus if the
elasticity of demand is negative this is equivalent to saying the demand curve
slopes downwards.
Remark 2: If ´ (x) = ¡2 then 1% increase in x leads to a 2% decrease in y: If
´ (x) = 3 then a 1% increase in x leads to a 3% increase in y:
Remark 3: An elasticity can be calculated for any function y = f (x) ; not just
for demand curves.
Example 1: If
y = 4 ¡ 2x
(a demand curve perhaps if y = Q and x = P ) then following the recipe:
1. We …rst write down the slope as:
dy
= ¡2:
dx
2. Since there is a y upstairs and an x downstairs we multiply this by
x
´ = ¡2 :
y
3. To obtain the elasticity as a function of x replace y with 4 ¡ 2x as:
´ (x) = ¡2
x
x
= ¡2
:
y
4 ¡ 2x
x
y
as:
CHAPTER 2. UNIVARIATE CALCULUS
30
Thus at x = 12 ; we have:
µ ¶
µ
¶
1
¡2x
´
=
j 1
2
4 ¡ 2x x= 2
1
= ¡
3
so that at x = 12 a 1% increase in x leads to a 0:33% decrease in y:
Notice that while the derivative is ¡2 for all x; the elasticity decreases as x
increases as shown in the plot below:
0
0.2
0.4
0.6
0.8
x
1
1.2
1.4
1.6
1.8
-2
-4
-6
-8
:
´ (x) =
¡2x
4¡2x
Example 2: If
y = x2 + 5
then following the recipe:
1. We …rst write down the slope as:
dy
= 2x:
dx
2. Since there is a y upstairs and an x downstairs we multiply this by
´ = 2x £
x
:
y
3. To obtain the elasticity as a function of x replace y with x2 + 5 as:
´ (x) = 2x £
2x2
x
=
:
x2 + 5
x2 + 5
x
y
as:
CHAPTER 2. UNIVARIATE CALCULUS
2.1.5
31
The Constant Elasticity Functional Form
Generally speaking the elasticity ´ (x) changes with x: Now with slopes we know
there exists a function
f (x) = ax + b
where the derivative f 0 (x) = a does not change with x although the elasticity:
´ (x) = f 0 (x)
ax
x
=
y
ax + b
does change with x:
Given the importance of elasticities in economics, a natural question to ask
whether there is a functional form which has the property that the elasticity ´ (x)
does not change with x? This is often convenient since it means, for example,
that a demand curve has the same elasticity no matter what the price.
The functional form which has this property is
f (x) = Axb :
In fact we have an even stronger result:
Theorem 34 A function f (x) has the same elasticity for all x if and only if it
can be written as:
f (x) = Axb :
Proof. To prove that f (x) = Axb has a constant elasticity note that:
f 0 (x) = bAxb¡1
and consequently:
´ (x) = f 0 (x)
x
x
= bAxb¡1 b = b:
f (x)
Ax
To prove that ´ (x) = b =) f (x) = Axb requires either integral calculus or
di¤erential equations and so we omit it here.
Example 1: The demand curve
Q = 1000P ¡3
has the functional form Axb and hence the elasticity of demand is ´ (P ) = ¡3;
the exponent on P:
Example 2: If we add a constant 10 to the demand curve in Example 1 as:
Q = 1000P ¡3 + 10
CHAPTER 2. UNIVARIATE CALCULUS
32
then the demand curve no longer has the functional form Axb and so does not
have the same elasticity for all P: In fact:
´ (P ) =
¡3
¡3000P ¡3
=
¡3
P3
1000P + 10
1 + 100
and so changing P changes the elasticity as illustrated in the plot below:
-1.4
-1.6
-1.8
-2
-2.2
-2.4
-2.6
-2.8
-3
0
1
2
P
´ (P ) =
2.1.6
3
4
5
:
¡3
P3
1+ 100
Local and Global Properties
When a hiker says: “just after the stream the trail climbs” he is talking about a
local property of the trail since it may be that later on the trail descends. On
the other hand if he says: “it rained today and the entire trail is muddy” he is
making a global statement, one that applies to the entire trail.
When we refer to functions we also will want to distinguish between local
and global properties. We have:
De…nition 35 Local Properties: We say f (x) has some property locally at
xo if there is a neighborhood around xo (perhaps very small) where f (x) has
that property.
De…nition 36 Global Properties: We say f (x) has some property globally if
the function has that property for all x in the domain of f (x).
Making a global statement is always stronger than making a local statement.
If the trail is globally muddy, then it follows that it is locally muddy just after
the stream. However if it is locally muddy just after the stream, it does not
follow that it is globally muddy. This clearly also holds for functions and so:
Theorem 37 If A is the statement \f (x) has property P globally" and B is
the statement \f (x) has property P locally" then
A =) B
but it is not true that B =) A:
CHAPTER 2. UNIVARIATE CALCULUS
33
A function can be either locally or globally increasing or decreasing according
to the following de…nitions:
De…nition 38 Locally Increasing: If f 0 (xo ) > 0 the function is locally increasing (upward sloping) at x = xo .
De…nition 39 Globally Increasing (or Monotonic): If f 0 (x) > 0 for all x
in the domain of f (x) then the function is globally increasing or monotonic.
De…nition 40 Locally Decreasing: If f 0 (xo ) < 0 the function is locally decreasing (or downward sloping) at x = xo :
De…nition 41 Globally Decreasing: If f 0 (x) < 0 for all x in the domain of
f (x) then the function is globally decreasing.
Example 1: Demand curves are globally downward sloping while supply
curves are globally upward sloping or monotonic.
Example 2: Consider the function graphed below:
1
0.8
0.6
0.4
0.2
0
2
4
6
x
8
10
12
:
This function is locally increasing for at say x0 = 4 and in general for any x < 6:
It is locally decreasing at x0 = 8; and in general for any x > 6: Since it increases
for some x and decreases for others, it is neither globally increasing nor globally
decreasing.
CHAPTER 2. UNIVARIATE CALCULUS
34
Example 3: Consider the function graphed below:
50
40
30
20
10
0
1
2
x
3
4
:
You can verify from the graph that this function is locally increasing at x0 = 1
and at x0 = 3: In fact it is increasing for all x and so this function is globally
increasing or monotonic.
2.1.7
The Sum, Product and Quotient Rules
There are a small number of rules to remember when calculating derivatives.
Three of the more important rules are the sum, product rule and quotient rules
given below:
Theorem 42 Sum Rule: If h (x) = af (x)+bg (x) where a and b are constants
then:
h0 (x) = af 0 (x) + bg 0 (x) :
Theorem 43 Product Rule: If h (x) = f (x) g (x) then
h0 (x) = f 0 (x) g (x) + f (x) g 0 (x) :
Theorem 44 Quotient Rule: If h (x) =
h0 (x) =
f (x)
g(x)
then
g (x) f 0 (x) ¡ f (x) g 0 (x)
g (x)2
:
Example 1: Given
f (x) = 3x5 + 4x3
we have from the sum rule that:
d ¡ 3¢
d ¡ 5¢
x +4
x
dx
dx
4
2
= 15x + 12x :
f 0 (x) = 3
CHAPTER 2. UNIVARIATE CALCULUS
35
Example 2: Given
h (x) = x2 f (x) :
then from the product rule we have:
h0 (x) = 2xf (x) + x2 f 0 (x) :
Example 3: Suppose that P (Q) is the inverse demand curve that a monopolist
faces with
P 0 (Q) < 0
so that the inverse demand curve slopes downwards. Total revenue (or sales) as
a function of Q is then equal to
R (Q) = P (Q) £ Q:
Marginal revenue then is de…ned by:
MR (Q) ´ R0 (Q) :
Using the product rule we obtain:
¡
z }| {
M R (Q) = P 0 (Q) Q + P (Q)
since P 0 (Q) < 0 and Q > 0: It follows that:
MR (Q) < P (Q)
so that the marginal revenue curve is always less than price. This divergence
between price and marginal revenue is the reason why a monopolist produces at
a lower level of output than is socially optimal (more precisely Pareto optimal).
For a perfectly competitive …rm on the other hand P does not depend on Q
and hence is a constant. Since the derivative of a constant is 0 it follows that
P 0 (Q) = 0 and hence:
MR (Q) = P:
Example 4: Suppose we want to calculate the derivative of
h (x) =
f (x)
:
x2
Then from the quotient rule we have
h0 (x) =
x2 f 0 (x) ¡ 2xf (x)
(x2 )2
:
CHAPTER 2. UNIVARIATE CALCULUS
36
Example 5: If C (Q) is the …rm’s cost function, then marginal cost is given by:
MC (Q) ´ C 0 (Q)
while average cost is
AC (Q) ´
C (Q)
:
Q
Di¤erentiating AC (Q) and using the quotient rule we …nd that:
AC 0 (Q) =
=
=
QC 0 (Q) ¡ C (Q)
Q2
µ
¶
1
C (Q)
C 0 (Q) ¡
Q
Q
M C (Q) ¡ AC (Q)
:
Q
From this we see that
AC 0 (Q) > 0 () M C (Q) > AC (Q)
AC 0 (Q) < 0 () M C (Q) < AC (Q)
so that the or AC 0 (Q) > 0 when marginal cost exceeds average cost, and average
cost curve is decreasing or AC 0 (Q) < 0 when marginal cost is less than average
cost.
Example 6: If
C (Q) = 10Q2 + 20
then:
MC (Q) = 20Q; AC (Q) =
20
10Q2 + 20
= 10Q +
Q
Q
so that:
p ´³
p ´
¢
10 ³
20
10 ¡ 2
Q
+
Q
¡
2
=
=
2
Q
¡
2
Q2
Q2
Q2
p
so that AC
p (Q) is falling for Q < 2 and hence AC > M C, AC (Q) is increasing
for Q > 2pand hence AC < M C; and that marginal and average cost are equal
when Q = 2; the minimum point of the average cost curve. You can see these
AC 0 (Q) = 10 ¡
CHAPTER 2. UNIVARIATE CALCULUS
37
relationships below where the straight line is MC (Q):
100
80
60
40
20
1
2
Q
3
4
5
:
MC (Q) and AC (Q)
2.1.8
The Chain Rule
We will often be working with a function of a function. For example consider
the function:
h (x) =
1
:
1 + x2
We can think of h (x) as consisting of two functions: an outside function:
f (x) = x1 and an inside function: g (x) = 1 + x2 ; that is:
h (x) = f (g (x))
1
=
g (x)
1
:
=
1 + x2
In general we have:
De…nition 45 Given:
h (x) = f (g (x))
we call f (x) the outside function and g (x) the inside function.
At the moment we have no rule for …nding h0 (x) : Suppose however that
we know how to calculate the derivative of the outside function f (x) and the
inside function g (x). The chain rule then allows us to calculate the derivative
of: h (x) = f (g (x)) as:
Theorem 46 Chain Rule: If h (x) = f (g (x)) then
h0 (x) = f 0 (g (x)) g 0 (x) :
CHAPTER 2. UNIVARIATE CALCULUS
38
In the beginning it is common for students to have trouble with the chain
rule. It should eventually become second nature but until then you might be
better of being very systematic. The chain rule can be broken down into a recipe
as follows:
A Recipe for the Chain Rule
1. Identify the outside function f (x) and the inside function g (x) : (If you
are not sure verify by putting the inside function g (x) inside f (x) as
f (g (x)) and make sure you get h (x).)
2. Take the derivative of the outside function: f 0 (x).
3. Replace x in f 0 (x) in 2. with the inside function g (x) to obtain: f 0 (g (x)) :
4. Take the derivative of the inside function: g 0 (x).
5. Multiply the result in 3. by that in 4. to get: h0 (x) = f 0 (g (x)) g0 (x).
Remark: It is important to correctly identify the outside and inside functions.
If instead we were to put f (x) inside g (x) as g (f (x)) we obtain a di¤erent
function. For example with f (x) = x1 and g (x) = 1 + x2 if instead of f (g (x))
one calculated:
g (f (x)) = 1 + f (x)2
µ ¶2
1
= 1+
x
1
= 1+ 2
x
which is not the same as f (g (x)) =
Example 1: For h (x) =
1
1+x2
1
1+x2 :
and following the recipe we have:
1. The outside function is f (x) =
1
x
and the inside function is g (x) = 1 + x2 :
2. Taking the derivative of the outside function we obtain: f 0 (x) = ¡ x12 :
3. Putting the inside function inside the result in 2: we obtain:
f 0 (g (x)) = ¡
1
2
g (x)
=¡
1
(1 + x2 )2
:
4. Taking the derivative of the inside function we obtain: g 0 (x) = 2x:
CHAPTER 2. UNIVARIATE CALCULUS
39
5. Multiplying 3: and 4: we obtain:
Ã
!
1
2x
£ |{z}
2x
=¡
:
h0 (x) = ¡
2
2
(1 + x )
(1 + x2 )2
from
step
4
|
{z
}
from step 3
Example 2: For h (x) =
p
1 + x4 we have:
p
1
x 2 and thepinside function is g (x) =
1. The outside function is f (x) = x = p
4
1 + x : We verify this as: f (g (x)) = g (x) = 1 + x4 :
1
2. Taking the derivative of the outside function we obtain: f 0 (x) = 12 x¡ 2 =
1
p
:
2 x
3. Putting the inside function inside the result in 2: we obtain:
1
1
:
f 0 (g (x)) = p
= p
2 1 + x4
2 g (x)
4. Taking the derivative of the inside function we obtain: g 0 (x) = 4x3 :
5. Multiplying 3: and 4: we obtain:
µ
¶
1
4x3
2x3
p
£ |{z}
4x3
= p
=p
:
h0 (x) =
4
4
2 1+x
2 1+x
1 + x4
|
{z
} from step 4
from step 3
2.1.9
Inverse Functions
Given a function: y = f (x) we will often want to reverse x and y and make x
the dependent variable and y the independent variable.
For example we usually think of a demand curve as having Q, the quantity, as
the dependent variable and P , the price, as the independent variable so we write
Q = Q (P ). In some applications however it is easier to make P the dependent
variable and Q the independent variable and write P = P (Q) ; which is the
inverse demand curve.
Example: Suppose we have the function y = f (x) = 6 ¡ 3x or changing the
notation:
Q = Q (P ) = 6 ¡ 3P
and so we think of this as an ordinary demand curve. This demand curve treats
quantity Q as the dependent variable and price P as the independent variable.
CHAPTER 2. UNIVARIATE CALCULUS
40
Suppose we wanted instead to have P as the dependent variable or the inverse
demand curve. By putting P on the left-hand sides as:
Q = 6 ¡ 3P =) 3P = 6 ¡ Q
we have:
1
P = P (Q) = 2 ¡ Q:
3
Translating back into the x; y notation the inverse demand curve takes the form:
y = g (x) = 2 ¡ 13 x:
The essence of an inverse function then is that we reverse the role of the
independent variable x and the dependent variable y: Visually one obtains an
inverse function by taking a graph and ‡ipping it around to put the y axis below
and the x axis above. Thus a function and its inverse really express the same
relationship between y and x:
Of course in order to prove things about inverse functions we need to de…ne
them. To see how this is done consider the demand curve and the inverse
demand curve in the x; y notation:
1
f (x) = 6 ¡ 3x; g (x) = 2 ¡ x:
3
Suppose we put g (x) inside f (x) : Then we obtain a remarkable result:
µ
¶
1
f (g (x)) = 6 ¡ 3 2 ¡ x = x:
3
If instead we put f (x) inside g (x) then we get the same remarkable result:
g (f (x)) = 2 ¡
1
(6 ¡ 3x) = x:
3
In both cases we obtain x: This is in fact the basis for the de…nition of an inverse
function:
De…nition 47 Inverse Function: Given a function f (x) if there exists another function g (x) such that
f (g (x)) = g (f (x)) = x
then we say that g (x) is the inverse function of f (x) and f (x) is the inverse
function of g (x) :
Remark: If you think of x as trapped inside f (x) ; then applying the inverse
function g (x) liberates x from f (x) since:
g (f (x)) = x:
CHAPTER 2. UNIVARIATE CALCULUS
41
Similarly f (x) liberates x from g (x) since:
f (g (x)) = x:
Often we when attempting to solve equations we will want to do just this, to
get x outside by itself. In this case inverse functions are the tool we need.
We have:
Theorem 48 If f (x) = xn for x > 0 then the inverse function of f (x) is
1
g (x) = x n :
Proof. If f (x) = xn then
1
1
g (f (x)) = g (xn ) = (xn ) n = xn£ n = x1 = x:
1
Example: If f (x) = x7 with x > 0 then the inverse function is g (x) = x 7 .
Suppose you wish to solve the equation:
f (x) = x7 = 3
for x; that is you wish to get x alone by itself. Using the inverse function to
free x we …nd that:
g (f(x))
=
g (3)
1
=) x = 3 7 :
Not all functions have an inverse function; in fact only globally increasing
or decreasing functions have inverses:
Theorem 49 Existence of an Inverse Function: The inverse function for
f (x) exists if and only if f (x) is either globally increasing or globally decreasing.
Example: The function
f (x) = (x ¡ 1)2
does not have an inverse function since f 0 (x) < 0 for x < 1 and f 0 (x) > 0 for
CHAPTER 2. UNIVARIATE CALCULUS
42
x > 1 as illustrated below:
4
3
y2
1
0
-1
1
2
x
3
f (x) = (x ¡ 1)2
The problem with this function is that if we ‡ip the graph around and make x
the dependent variable as:
3
2
y
1
0
1
2
x
3
4
5
-1
Note that associated with each y are not one but two x0 s and so this is not a
proper function.
2.1.10
The Derivative of an Inverse Function
The question we now address is the relationship between the derivatives of two
inverse functions f (x) and g (x) :We have:
Theorem 50 Derivative of an Inverse Function: If f (x) has an inverse
function g (x) then g 0 (x) is given by:
g 0 (x) =
1
f 0 (g (x))
:
CHAPTER 2. UNIVARIATE CALCULUS
43
Proof. Taking the derivative of both sides of
f (g (x)) = x
and using the chain rule we obtain:
f 0 (g (x)) g 0 (x) = 1:
Solving for g0 (x) then gives the result.
Thus the slope of the inverse function g (x) is the inverse of the slope of the
original function (with x replaced with g (x)).
Example 1: We saw that demand and inverse demand curves written in x; y
notation:
1
f (x) = 6 ¡ 3x; g (x) = 2 ¡ x
3
are inverses of each other. We have: f 0 (x) = ¡3 and g 0 (x) = ¡ 13 so that the
two functions have derivatives which are inverses of each other.
Example 2: If
f (x) = x2
then f 0 (x) = 2x: The inverse function of f (x) is
1
g (x) = x 2
1
and so g 0 (x) = 12 x¡ 2 . Alternative we could calculate g 0 (x) from f (x) as:
g0 (x) =
=
=
=
2.1.11
1
f 0 (g (x))
1
2 £ g (x)
1
1
2 £ x2
1 ¡1
x 2:
2
The Elasticity of an Inverse Function
Suppose that f (x) has an inverse function: g (x) and that the corresponding
elasticities are:
´ f (x) ´
We have:
f 0 (x) x
g 0 (x) x
; ´g (x) ´
:
f (x)
g (x)
CHAPTER 2. UNIVARIATE CALCULUS
44
Theorem 51 If f (x) has an inverse function: g (x) then:
´g (x) =
1
:
´f (g (x))
Proof. Since g (x) is the inverse function of f (x) we have:
x = f (g (x)) :
Replacing x by f (g (x)) we obtain:
´g (x) =
=
Now replacing g 0 (x) by
1
f 0 (g(x))
g0 (x) x
g (x)
g0 (x) f (g (x))
g (x)
from Theorem 50 we obtain:
´g (x) =
=
=
=
g0 (x) f (g (x))
g (x)
f (g (x))
f 0 (g (x)) g (x)
1
³ 0
´
f (g(x))g(x)
f (g(x))
1
:
´f (g (x))
Example 1: Consider the function f (x) and its inverse g (x) given by:
1
f (x) = x3 ; g (x) = x 3 :
Since both f (x) and g (x) are of the form Axb the elasticities of each are given
by the exponents on x: Thus:´f (x) = 3 and ´g (x) = 13 = ´ 1(x) :
f
Example 2: Suppose a monopolist faces a demand curve: Q = Q (P ) that
has an elasticity ´Q (P ) and that the inverse demand curve P = P (Q) has an
elasticity: ´P (Q). Then from Theorem 51:
´P (Q) =
1
:
´Q (P (Q))
Now revenue for the monopolist as a function of Q is given by:
R (Q) = Q £ P (Q)
CHAPTER 2. UNIVARIATE CALCULUS
45
so that marginal revenue is from the product rule:
MR (Q) = R0 (Q)
= P (Q) + Q £ P 0 (Q)
µ
¶
P 0 (Q) £ Q
= P (Q) 1 +
P (Q)
= P (Q) (1 + ´ P (Q))
µ
¶
1
= P (Q) 1 +
:
´Q (P (Q))
Since the monopolist choose Q where MR = M C; and since M C > 0 it
follows that
µ
¶
1
P (Q) 1 +
> 0 =) ´ Q (P (Q)) < ¡1
´ Q (P (Q))
so the monopolist always acts on the elastic part of the demand curve.
2.2
Second Derivatives
He who can digest a second or third derivative ... need not, we think,
be squeamish about any point of divinity. -George Berkeley
Since the derivative f 0 (x) is a function, it too has a derivative, which is the
second derivative of f (x). We have then:
De…nition 52 Second Derivative: The second derivative of f (x) ; denoted
d2 y
d
0
0
by f 00 (x) or dx
2 or dx (f (x)) is the …rst derivative of f (x) :
Example: Consider the function:
f (x) = x3 ¡ x =) f 0 (x) = 3x2 ¡ 1:
The second derivative f 00 (x) is then the …rst derivative of the …rst derivative or:
f 0 (x) = 3x2 ¡ 1 =) f 00 (x) = 6x:
2.2.1
Convexity and Concavity
Alice didn’t dare to argue the point, but went on: “and I thought
I’d try and …nd my way to the top of that hill.” “When you say
‘hill’,” the Queen interrupted,” I could show you hills, in comparison
with which you’d call that a valley.” “No, I shouldn’t,” said Alice,
surprised into contradicting her at last: “a hill can’t be a valley, you
know, That would be nonsense.”
-Lewis Carroll Through the Looking Glass.
CHAPTER 2. UNIVARIATE CALCULUS
46
While the sign of f 0 (x) tells you if the function is upward or downward
sloping, the sign of f 00 (x) tells you whether you or standing on a mountain or in
a valley or in mathematical jargon, whether the function is concave or convex.
We have:
De…nition 53 Local Concavity: The function f (x) is locally concave at xo
(or locally mountain-like) if and only if f 00 (xo ) < 0:
De…nition 54 (Global) Concavity: The function f (x) is (globally) concave
(or globally mountain-like) if and only if f 00 (x) < 0 for all x in the domain of
f (x) :
De…nition 55 Local Convexity: The function f (x) is locally convex xo (or
locally valley-like) if and only if f 00 (xo ) > 0:
De…nition 56 (Global) Convexity: The function f (x) is (globally) convex
(or globally valley-like) if and only if f 00 (x) > 0 for all x in the domain of
f (x) :
Concavity and convexity are fundamental concepts and so we will be referring
to them often. It will quickly become tedious if we always have to qualify
concavity and convexity with either ‘local’ or ‘global’. For this reason we will
adopt the following convention:
Convention: When we say a function is concave without saying ‘global’ or
‘local’, we mean the function is globally concave. Similarly if we say a function
is convex without saying ‘global’ or ‘local’, we mean the function is globally
convex.
Example 1: Given
f (x) = x3 ¡ x
if x0 = ¡1 then
f 0 (¡1) = 2 > 0 and f 00 (¡1) = ¡6 < 0
and so at x0 = ¡1; f (x) is locally increasing (or upward sloping) and locally
concave (locally mountain-like ).
At x0 = 1
f 0 (1) = 2 > 0 and f 00 (1) = 6 > 0
and so at x0 = 1 f (x) is locally increasing and locally convex (or locally valleylike).
More generally since f 00 (x) = 6x it follows in general that f 00 (x) < 0 for
x < 0 and hence f (x) is locally concave (or locally mountain-like) for x < 0:
Similarly for x > 0 f 00 (x) > 0 and so f (x) is locally convex (or locally valleylike).
CHAPTER 2. UNIVARIATE CALCULUS
47
Example 2: Given:
f (x) =
1
= x¡1 for x > 0
x
we have
f 0 (x) = ¡x¡2 = ¡
1
<0
x2
for all x: Hence f (x) is globally decreasing. Furthermore:
f 00 (x) = 2x¡3 =
2
>0
x3
for all x and so that f (x) is globally convex. These properties of f (x) are
illustrated in the plot below:
5
4
3
y
2
1
1
2
x
f (x) =
3
4
5
:
1
x
Example 3: Given the function:
f (x) = ¡
1
= ¡x¡1 for x > 0
x
we have
f 0 (x) = x¡2 =
1
>0
x2
for all x: Hence f (x) is (globally) increasing or monotonic. Furthermore:
f 00 (x) = ¡2x¡3 = ¡
2
<0
x3
CHAPTER 2. UNIVARIATE CALCULUS
48
for all x and so f (x) is globally concave or globally mountain-like. This is
illustrated in the plot below:
-1
-2
y
-3
-4
-5
1
2
x
f (x) =
2.2.2
3
4
5
:
¡ x1
Economics and ‘Diminishing Marginal ...’
In economics one often hears the expression: ‘diminishing marginal ...’. Recall
that the marginal is the derivative f 0 (x) : Thus if the marginal is decreasing the
…rst derivative of f 0 (x) or f 00 (x) must be negative or
d
(f 0 (x)) < 0:
dx
Thus stating that the marginal is decreasing is equivalent to stating that
f 00 (x) < 0 or that the function is concave (mountain-like).
f 00 (x) =
Example: The Cobb-Douglas production function:
p
Q = f (L) = L
plotted below:
5
4
Q3
2
1
0
5
10
L
15
20
The Production Function: Q =
25
p
L
CHAPTER 2. UNIVARIATE CALCULUS
49
has a marginal product of labour:
1
M PL (L) = f 0 (L) = p
2 L
which although positive, decreases as L increases as plotted below:
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
5
10
L
15
MPL (L) =
20
25
:
1
p
2 L
Equivalently
M PL0 (L) = f 00 (L) =
¡1
³p ´3 < 0
4
L
and so a diminishing marginal product of labour is equivalent to the production
function being concave or mountain-like.
2.3
2.3.1
Maximization and Minimization
First-Order Conditions
The cornerstone of economic thinking is that people are assumed to be rational.
Rational behavior generally means maximizing or minimizing something. Thus
rational households maximize utility and rational …rms maximize pro…ts. Both
pro…t maximizers and utility maximizers must minimize costs.
A maximum (minimum) is found on the top of a mountain (bottom of a
valley) at a point where the mountain (valley) is ‡at. If it were not ‡at then
you could always go a little higher (lower) by either moving up or down the
slope.
This intuition leads to the …rst-order conditions for a maximum or minimum:
Theorem 57 First-Order Condition for a Maximum. If f (x) is maximized at x = x¤ then f 0 (x¤ ) = 0:
CHAPTER 2. UNIVARIATE CALCULUS
50
Theorem 58 First-Order Condition for a Minimum. If f (x) is minimized at x = x¤ then f 0 (x¤ ) = 0:
Remark: Calculating …rst-order conditions is one of the most basic skills that
an economist must have. Although this is often straightforward, there are nevertheless common problems that occur when students do not derive them systematically. For this reason you may wish at the beginning to use the following
recipe:
Recipe for Calculating First-Order Conditions
Given a function f (x) to be maximized or minimized:
1. Calculate the …rst derivative f 0 (x) :
2. Replace all occurrences of x in 1: with x¤ and the resulting expression
equal to 0:
3. If possible solve the expression in 2. for x¤ or if this is not possible, try
and learn something about x¤ from the …rst-order conditions.
Example 1: Consider the function:
f (x) = x3 ¡ x
Following the recipe we have:
1. Calculate the derivative of f (x) :
f 0 (x) = 3x2 ¡ 1:
2. Put a
¤
on x in 1: and set the result equal to 0: Thus:
f 0 (x¤ ) = 3 (x¤ )2 ¡ 1 = 0:
3. Solving the expression in 2: we …nd that:
3 (x¤ )2 ¡ 1
=
0
=) (x¤ )2 =
1
3
1
1
=) x¤1 = p and x¤2 = ¡ p :
3
3
Example 2: Consider the function
f (x) = x1=2 ¡ x
with domain x ¸ 0: Following the recipe we have:
CHAPTER 2. UNIVARIATE CALCULUS
51
1. Calculate the derivative of f (x) :
1
f 0 (x) = x¡1=2 ¡ 1:
2
2. Put a
¤
on x in 1: and set the result equal to 0: Thus:
f 0 (x¤ ) =
1 ¤ ¡1=2
(x )
¡ 1 = 0:
2
3. Solving the expression in 2. we …nd that:
1 ¤ ¡1=2
(x )
¡1
2
=
0
=) (x¤ )¡1=2 = 2
1
=) x¤ = 2¡2 = :
4
2.3.2
Second-Order Conditions
The …rst-order conditions for a maximum and a minimum are identical and so
just from: f 0 (x¤ ) = 0 we have no way of knowing whether x¤ is a maximum
or a minimum. This is then where the second derivative f 00 (x) and the secondorder conditions become useful. The basic principle is that mountains (concave
functions with f 00 (x) < 0 ) have tops or maxima and valleys (convex functions
with f 00 (x) > 0 ) have bottoms or minima.
For the moment we will only deal with local maxima and minima. If x¤
satis…es the …rst-order conditions: f 0 (x¤ ) = 0 and at x¤ the function is locally
mountain-like, then x¤ must be a local maximum. If at x¤ the function is
locally valley-like, then x¤ must be a local minimum.
Consequently we have:
Theorem 59 Second-Order Conditions for a Local Maximum. If f 0 (x¤ ) =
0 and f 00 (x¤ ) < 0 (i.e. f (x) is locally concave or mountain-like at x¤ ) then x¤
is a local maximum.
Theorem 60 Second-Order Conditions for a Local Minimum. If f 0 (x¤ ) =
0 and f 00 (x¤ ) > 0 (i.e. f (x) is locally convex or valley-like at x¤ ) then x¤ is a
local minimum.
Example 1 (continued): For the function:
f (x) = x3 ¡ x
the solutions to the …rst-order conditions are: x¤1 =
lating f 00 (x) we have:
p1
3
and x¤2 = ¡ p13 : Calcu-
f 0 (x) = 3x2 ¡ 1 =) f 00 (x) = 6x:
CHAPTER 2. UNIVARIATE CALCULUS
52
To begin consider x¤1 = p13 we …nd that:
¶
µ
1
1
f 00 p
= 6 £ p = 3:4641 > 0
3
3
so that f (x) is locally convex at x¤1 and hence from the second-order conditions
x¤1 = p13 is a local minimum.
For x¤2 = ¡ p13 we …nd that
¶
µ
¡1
1
f 00 ¡ p
= 6 £ p = ¡3:4641 < 0
3
3
so that f (x) is locally concave at x¤2 and hence from the second-order conditions x¤2 = ¡ p13 is a local maximum.
Example 2 (continued): For the function:
f (x) = x1=2 ¡ x
the solution to the …rst-order conditions is: x¤ = 14 : Now since
¡ ¢¡3=2
µ ¶
¡ 14
x¡3=2
1
00
00
=) f
= ¡2 < 0
f (x) = ¡
=
4
4
4
it follows that f (x) is locally concave (or mountain-like) at x¤ =
x¤ = 14 is a local maximum.
2.3.3
1
4
and hence
Su¢cient Conditions for a Global Maximum or Minimum
In economics we are usually only interested in global maxima or minima. A
pro…t maximizing …rm would not chose a local pro…t maximum if it were not
a global maximum. If x¤ satis…es the …rst and second-order conditions then all
we can say is that x¤ is a local maximum or a local minimum. We do not know
if we are at a global maximum or a global minimum.
If a function f (x) is globally concave then it is everywhere mountain-like; essentially f (x) is one mountain. Now if you …nd a ‡at spot on this one mountain
it must be a global maximum, there can be no higher point on the mountain. .
Similarly if a function f (x) is globally convex then it is everywhere valleylike so that essentially f (x) is one valley. Now if you …nd a ‡at spot on this one
valley it must be a global minimum.
Thus local concavity or convexity insures that if f 0 (x¤ ) = 0 then x¤ is a
local maximum or minimum. Global concavity or convexity on the other hand
insures that x¤ is a global maximum or minimum.
Theorem 61 If a function f (x) is globally concave so that f 00 (x) < 0 for all
x and x¤ satis…es the …rst-order conditions: f 0 (x¤ ) = 0 then x¤ is a unique
global maximum.
CHAPTER 2. UNIVARIATE CALCULUS
53
Theorem 62 If a function f (x) is globally convex so that f 00 (x) > 0 for all
x and x¤ satis…es the …rst-order conditions: f 0 (x¤ ) = 0 then x¤ is a unique
global minimum.
Note that this is a su¢cient condition for x¤ to be a global maximum (minimum), it is not a necessary condition for a global maximum (minimum). As
we shall see there are functions which are not globally concave or convex which
nevertheless have a unique global maximum or minimum.
Example 1: The function:
f (x) = x3 ¡ x
actually has no global maximum or minimum since f (x) ! §1 as x ! §1 as
shown in the plot below:
6
4
y
2
-2
-1
0
1
x
2
-2
-4
-6
:
3
f (x) = x ¡ x
However if we restrict the domain to be x > 0 then since x > 0:
f 00 (x) = 6x > 0
and so the function is globally convex.
We saw there were two solutions to the …rst-order conditions: x¤1 = p13 and
x¤2 = ¡ p13 : Since x¤2 is negative it is not now in the domain of f (x) so that
CHAPTER 2. UNIVARIATE CALCULUS
x¤1 =
p1
3
54
is the unique global minimum. This is illustrated in the plot below:
6
5
4
y3
2
1
0
0.5
1
x
1.5
2
:
f (x) = x3 ¡ x for x > 0
Example 2: Consider the function
f (x) = x1=2 ¡ x
with domain x ¸ 0 which is plotted below:
0.2
0
0.5
x
1
1.5
2
-0.2
y
-0.4
-0.6
:
f (x) = x1=2 ¡ x
Now since
1
f 00 (x) = ¡ x¡3=2 < 0
4
for all x (since x¡3=2 > 0) it follows that f (x) is globally concave and x¤ =
is the unique global maximum.
1
4
CHAPTER 2. UNIVARIATE CALCULUS
2.3.4
55
Pro…t Maximization
Then I looked on all the works that my hands had wrought, and on
the labour that I had laboured to do; and, behold, all was vanity and
a striving after wind, and there was no pro…t under the sun.
-Ecclesiastes 2:11
One should always generalize. -Karl Jacobi
Consider the problem of a …rm maximizing pro…ts in the short-run. We are
going to look at this problem from various levels of generality, beginning with a
very simple special case and from there increasing the level of generality. Often
in textbooks one sees the most general case …rst and then one looks at speci…c
examples. Knowledge in real life does not usually progress in this manner;
rather new ideas typically begin with special cases from which a researcher
develops some understanding and curiosity. From there he or she then attempts
to generalize the results of the special case.
Example 1: Consider the least general case of a …rm with Cobb-Douglas production function where:
1
Q = f (L) = L 2 :
Pro…ts for the …rm are:
¼ (L) = P f (L) ¡ W L
1
= PL2 ¡ WL
where P is the price the …rm receives and W is the nominal wage. Di¤erentiating
with respect to L we …nd that:
1
1
¼0 (L) = P L¡ 2 ¡ W
2
so that putting a ¤ on L and setting the derivative equal to 0 yields the …rst-order
condition for pro…t maximization:
1
1
P L¤¡ 2 ¡ W = 0:
2
If we de…ne w =
W
P
as the real wage then this implies that:
1 ¤ ¡ 12
(L )
=w
2
or:
1
L¤ = w¡2 :
4
CHAPTER 2. UNIVARIATE CALCULUS
56
This gives the pro…t maximizing L¤ as a function of the real price of labour w
and so is the …rm’s labour demand curve. Since L¤ has the functional form Axb
the elasticity of demand is the exponent on w or ¡2.
6
5
4
L3
2
1
0 0.2 0.4
0.6
0.8
1
w
1.2
1.4
1.6
1.8
2
L¤ (w) = 14 w¡2
Furthermore L¤ is a global maximum since:
1
3
¼00 (L) = ¡ P L¡ 2 < 0
4
and hence ¼ (L) is globally concave.
If P = 4 and W = 2 then w = 24 = 12 and the …rm would hire
1
1 1
L¤ = w¡2 = ¡ ¢¡2 = 1
4
4 1
2
or one worker. If there were a 100% in‡ation so that P and W doubled to P = 8
and W = 4 then the real wage remains the same as w = 12 and so L¤ stays at
L¤ = 1: This re‡ects the fact that a rational …rm only cares about the real
wage when hiring.
The …rm’s supply curve is found by substituting the optimal amount of
1
labour L¤ = 14 w¡2 into the production function: Q = f (L) = L 2 so that:
µ
¶1
1 ¡2 2
¤
w
Q =
4
1 ¡1
w
=
2
µ ¶¡1
1 W
=
2 P
1
p:
=
2
¤
P
1
where p = W
is the real price of the good Q: Thus: dQ
dp = 2 > 0 and the
supply curve slopes upwards. Note that the supply curve has the form Axb and
so the elasticity of supply is the exponent on p or 1:
CHAPTER 2. UNIVARIATE CALCULUS
57
Example 2: We have found that the …rm’s labour demand curve slopes downwards and its supply curve slopes upwards, that both depend on the real prices
w and p, and both have a constant elasticity. Suppose you were the …rst to
actually derive these results. If you are curious you would then ask yourself if
these results are merely a coincidence or indicative of more general results?
We therefore seek to generalize. To do this we could replace the exponent: 12
1
exponent on Q = L 2 with say 13 and redo the analysis. Once this done we could
1
then replace 3 with 25 and so on. The problem with this approach is that there
are an in…nite number of possible exponents on L we could use so we would
never arrive at any …rm conclusions.
Suppose instead we replace 12 not with a number but with a letter ® and
write the …rm’s production function as:
Q = f (L) = AL® :
where A > 0 and where we assume that:
0 < ® < 1:
By doing this we will be able to analyze an in…nite number of possible exponents
at once!
The assumptions on ® insure a positive and diminishing marginal product
of labour since:
M PL (L) ´ f 0 (L) = ®AL®¡1 > 0
and:
¡
z }| {
dM PL (L)
= ®(® ¡ 1)AL®¡2 < 0
f (L) =
dL
00
since ® < 1.
Pro…ts for the …rm are given by:
¼ (L) = P f (L) ¡ W L
= P AL® ¡ W L
so that:
¼ 0 (L) = ®P AL®¡1 ¡ W:
Putting a ¤ on L and setting the result equal to 0 yields the …rst-order condition
for pro…t maximization:
®¡1
®P A (L¤ )
¡ W = 0:
Solving for L¤ we obtain the labour demand curve:
1
1
L¤ = (®A) 1¡® w ®¡1
CHAPTER 2. UNIVARIATE CALCULUS
58
where w is the real wage. This has the form Axb and so the elasticity of demand
is the exponent on w or:
1
<0
®¡1
since ® < 1: It follows then that the labour demand curve slopes downwards.
Note this includes the results of the previous example as a special case since if
® = 12 the elasticity becomes
1
2
1
= ¡2:
¡1
This is a global pro…t maximum since:
¡
z }| {
¼ (L) = P ®(® ¡ 1)AL®¡2 < 0
00
so that ¼ (L) is globally concave.
The supply curve is found by substituting L¤ into Q = f (L) so that:
´®
³
1
1
Q¤ = f (L¤ ) = A (®A) 1¡® w ®¡1
1
®
1
®
®
= A 1¡® ® 1¡® w ®¡1
®
µ ¶ ®¡1
1
®
W
= A 1¡® ® 1¡®
P
®
= A 1¡® ® 1¡® p 1¡®
®
= Bp 1¡®
1
®
P
where: B = A 1¡® ® 1¡® and p = W
is the real price of Q: This also has the form
b
Ax and so the elasticity of supply is the exponent on p:
®
>0
1¡®
and so the supply curve slopes upwards. This includes the previous result as a
special case since if ® = 12 the elasticity of supply is
1
2
1¡
1
2
= 1:
Example 3: We see from the previous example that the elasticities change
when we change ® from 12 but that the …rm’s labour demand curve still slopes
downward and the supply curve still slopes upward. If however we allowed
® > 1 then the expression for the labour demand curve elasticity would become
positive; that is:
1
> 0:
®¡1
CHAPTER 2. UNIVARIATE CALCULUS
59
However ® > 1 would also mean that we would no longer have a diminishing
marginal product of labour and the pro…t function would no longer be concave
but instead would be convex. All these clues point to the fact that it is the
diminishing marginal product of labour that is the key requirement in obtaining
a downward sloping labour demand and upward sloping supply curves.
We now attempt to generalize even further. Consider replacing Q = AL®
with:
Q = f (L)
where the only assumptions we now make are that the marginal product of
labour is positive and diminishing so that: M PL (L) = f 0 (L) > 0 and: M PL0 (L) =
f 00 (L) < 0:
Pro…ts as a function of L are then given by:
¼ (L) = P f (L) ¡ W L
Di¤erentiating with respect to L we …nd that:
¼ 0 (L) = P f 0 (L) ¡ W
so that putting a ¤ on L and setting the derivative equal to 0 yields the …rst-order
condition for pro…t maximization:
¼0 (L¤ )
= 0 =) P f 0 (L¤ ) ¡ W = 0
=) M PL (L¤ ) = w:
This result shows that the inverse labour demand curve is in fact the marginal
product of labour curve. Furthermore it shows that labour demand L¤ is a
function of the real wage w:
Since
¼00 (L) = P f 00 (L) < 0
for all L; the pro…t function is globally concave and L¤ is a global maximum.
Consider now the problem of showing that the labour demand curve L¤ =
¤
L (w) is downward sloping. The labour demand curve is implicitly de…ned by:
M PL (L¤ (w)) = w:
¤
¤
We would like …nd the derivative dLdw(w) and show that: dLdw(w) < 0; that is
that the demand for labour slopes downwards. The problem is that L¤ (w) is
trapped inside the marginal product of labour M PL (L) and so we cannot get at
¤
it directly. We can however use the chain rule to get an expression for dLdw(w) :
Here MPL (L) is the outside function and L¤ (w) is the inside function. Thus
di¤erentiating both sides with respect to the real wage w we obtain:
d
d
dL¤ (w)
M PL (L¤ (w)) =
w =) M PL0 (L¤ (w))
= 1:
dw
dw
dw
CHAPTER 2. UNIVARIATE CALCULUS
60
¤
Note that the chain rule forces dLdw(w) outside where we can now work with it.
Since we assume a diminishing marginal product of labour or MPL0 (L) < 0 we
conclude that:
1
dL¤ (w)
=
<0
0
dw
M PL (L¤ (w))
which shows then that the labour demand curve is downward sloping.
The …rm’s supply curve is given by replacing L with L¤ (w) in the production
function as:
¢¢
¡ ¡
Q¤ = f (L¤ (w)) = f L¤ p¡1 :
P
where p = W
and hence p¡1 = W
P = w: Consider now showing that the supply
¤
>
0: Here p is buried deep inside: f (L) ; L¤ (w)
curve slopes upwards or: dQ
dp
¡1
and p so we will need to use the chain rule three times!. We have:
dQ¤
dp
= f 0 (L¤ (w))
dL¤ (w) dp¡1
dw
dp
¡
¡
z }| { z}|{
+
z }| { dL¤ (w)
1
0
¤
£¡ 2
= f (L (w))
dw
p
> 0:
Thus the …rm’s supply curve is upward sloping.
This problem illustrates how one can obtain very general results with very
minimal assumptions. We have shown that given only a positive and diminishing
marginal product of labour that labour demand must slope downwards and
supply must slope upwards. Furthermore demand and supply are functions of
the real prices w and p:
2.4
Econometrics
Econometrics is the bridge between theory and real life. -James Ramsey
2.4.1
Least Squares Estimation
Estimation of a Constant: ¹
Consider the simple linear regression model:
Yi = ¹ + ei ; i = 1; 2; : : : n:
Here ei is random noise. If it were not for this random noise each Yi would
be identical as: Yi = ¹: Since the data is corrupted however we do not get to
CHAPTER 2. UNIVARIATE CALCULUS
61
directly observe ¹ but only a sample of the Yi 0 s . Our problem is to guess what
¹ is.
^ ; the least squares estimator of ¹ which minimizes the sum of
Our guess is ¹
squares function:
S (¹) =
n
X
i=1
(Yi ¡ ¹)2
= (Y1 ¡ ¹)2 + (Y2 ¡ ¹)2 + ¢ ¢ ¢ + (Yn ¡ ¹)2 :
^ is x¤ : The function: S (¹) is in fact
Here ¹ plays the role of the x variable and ¹
just a quadratic.
Using the sum and chain rule to di¤erentiate S (¹) we have:
S 0 (¹) = ¡2 (Y1 ¡ ¹) ¡ 2 (Y2 ¡ ¹) ¡ ¢ ¢ ¢ ¡ 2 (Yn ¡ ¹)
= ¡2 (Y1 + Y2 + ¢ ¢ ¢ + Yn ¡ n¹) :
It follows then that the …rst-order conditions require:
S 0 (^
¹)
¹)
= 0 = ¡2 (Y1 + Y2 + ¢ ¢ ¢ + Yn ¡ n^
=) n^
¹ = Y1 + Y2 + ¢ ¢ ¢ + Yn
Y1 + Y2 + ¢ ¢ ¢ + Yn
= Y¹ :
=) ¹
^=
n
Thus our best guess of ¹ is the sample mean of the Yi 0 s.
S (¹) is globally convex since:
S 00 (¹) = |2 + 2 +
{z¢ ¢ ¢ + 2}
n times
= 2n > 0
^ is in fact a global minimum of S (¹).
and so ¹
Example: Suppose one has data on the consumption of n = 4 families:
i:
1 2 3
4
Yi = 72 58 63 55:
Here each family consumes di¤erent amounts than ¹ because of random noise
ei (e.g., unexpected dental bills). To …nd ¹
^ we construct the sum of squares
function:
S (¹) = (72 ¡ ¹)2 + (58 ¡ ¹)2 + (63 ¡ ¹)2 + (55 ¡ ¹)2
CHAPTER 2. UNIVARIATE CALCULUS
62
which is plotted below:
7000
6000
5000
4000
3000
2000
1000
20
40
60
mu
80
100
:
S (¹)
As illustrated in the graph, the minimum of S (¹) occurs at the sample mean
1
¹
^ = Y¹ = (72 + 58 + 63 + 55) = 62:
4
Linear Regression
Now suppose Yi varied systematically with another variable, called a regressor:
Xi as:
Yi = Xi ¯ + ei ; i = 1; 2; : : : n:
For example if Yi is the consumption of family i then Xi might be their income
dYi
. We
so that ¯ would be the marginal propensity to consume since: ¯ = dX
i
cannot again directly observe ¯ from the data because the data is corrupted by
the random noise ei : Instead we estimate ¯ from a set of n observations on Yi
^ which minimizes:
and Xi using the least squares estimator ¯
S (¯) =
n
X
i=1
(Yi ¡ Xi ¯)2
= (Y1 ¡ X1 ¯)2 + (Y2 ¡ X2 ¯)2 + ¢ ¢ ¢ + (Yn ¡ Xn ¯)2 :
Using the sum and chain rule to di¤erentiate S (¯) we have:
S 0 (¯) = ¡2X1 (Y1 ¡ X1 ¯) ¡ 2X2 (Y2 ¡ X2 ¯) ¡ ¢ ¢ ¢ ¡ 2Xn (Yn ¡ Xn ¯)
CHAPTER 2. UNIVARIATE CALCULUS
63
so that the …rst-order conditions require:
³ ´
³
³
³
´
´
´
^
^ ¡ 2X2 Y2 ¡ X2 ¯
^ ¡ ¢ ¢ ¢ ¡ 2Xn Yn ¡ Xn ¯
^
= 0 = ¡2X1 Y1 ¡ X1 ¯
S0 ¯
³
³
³
´
´
´
^ + X2 Y2 ¡ X2 ¯
^ + ¢ ¢ ¢ + Xn Yn ¡ Xn ¯
^ =0
=) X1 Y1 ¡ X1 ¯
¢
¡
^ = X1 Y1 + X2 Y2 + ¢ ¢ ¢ + Xn Yn
=) X12 + X22 + ¢ ¢ ¢ + Xn2 ¯
^ = X1 Y1 + X2 Y2 + ¢ ¢ ¢ + Xn Yn :
=) ¯
X12 + X22 + ¢ ¢ ¢ + Xn2
S (¯) is globally convex as long as Xi 6= 0 for at least one i (i.e., at least one
family has a non-zero income) since:
S 00 (¯) = 2X12 + 2X22 + ¢ ¢ ¢ + 2Xn2 > 0:
^ is a global minimum.
It follows that the least squares estimator ¯
Example: Suppose one has data on the consumption of n = 4 families along
with their income as:
i: 1 2 3 4
Yi = 72 58 63 55
Xi = 98 80 91 73
so that for example family 2 has consumption of 58 and an income of 80: We
seek the best line Y = ¯X which goes through the data plotted below:
95
90
85
80
75
56
58
60
62
64
Income
66
68
70
72
:
The sum of squares is then:
S (¯) = (72 ¡ 98¯)2 + (58 ¡ 80¯)2 + (63 ¡ 91¯)2 + (55 ¡ 73¯)2
CHAPTER 2. UNIVARIATE CALCULUS
64
which is plotted below:
3000
2500
2000
1500
1000
500
0 0.4
0.5
0.6
0.7
beta
0.8
0.9
1
::
S (¯)
As illustrated in the graph, the minimum of S (¯) occurs at:
^
¯
X1 Y1 + X2 Y2 + ¢ ¢ ¢ + Xn Yn
X12 + X22 + ¢ ¢ ¢ + Xn2
98 £ 72 + 80 £ 58 + 91 £ 63 + 73 £ 55
=
982 + 802 + 912 + 732
= 0:724:
=
^=
Thus the estimated marginal propensity to consume from this data set is ¯
0:724:
2.4.2
Maximum Likelihood
Maximum likelihood is a very general technique used in econometrics which can
be applied to almost any problem including those where linear regression fails.
The basic approach is to calculate the likelihood L (µ) where µ is a parameter
of interest. The maximum likelihood estimator of µ is then that µ which maxiµ and hence solves the …rst-order
mizes L (µ) : This is traditionally denoted by ^
conditions:
³ ´
dL ^
µ
= 0:
dµ
It is usually easier to maximize the log-likelihood de…ned by l (µ) = ln (L (µ))
which gives the same result since ln (x) is a monotonic; that is:
³ ´
³ ´
³ ´
^
dL ^µ
dl ^
µ
dL
µ
1
=0,
= ³ ´
= 0:
dµ
dµ
dµ
L ^
µ
CHAPTER 2. UNIVARIATE CALCULUS
65
Once ^µ is found from the …rst-order conditions, we often wish to construct
a con…dence interval which will then indicate how accurate our guess is. Traditionally one constructs a 95% con…dence interval for the unknown µ; which
takes the form:
p
^
µ § 1:96 £ ±
where ± is the variance of ^µ: This formula says that µ will lie within the interval:
p
p
^µ ¡ 1:96 £ ± · µ · ^
µ + 1:96 £ ±
95 time out of 100; or equivalently, 19 times out of 20:
To construct our con…dence interval calculate ± using:
³ ´ 1¡1
0
d2 l ^
µ
A :
± = @¡
2
dµ
Note that since we are maximizing the log-likelihood from the second-order
µ)
d2 l(^
conditions: dµ2 < 0 so that ± > 0.
Example: Suppose an unknown proportion of the population: µ favour some
policy while 1 ¡ µ are against this policy. You decide to conduct a survey of the
n randomly chosen people to estimate the unknown µ: Suppose mi = 1 if the ith
person says he supports the policy, and mi = 0 if he says he does not support
the policy. Since µ is the probability that mi = 1 and 1 ¡ µ is the probability
that mi = 0 the probability of mi is given by:
Pr [mi ] = µmi (1 ¡ µ)1¡mi :
Since each person is chosen independently, the likelihood is the product of these
probabilities:
1¡m1
£ µm2 (1 ¡ µ)1¡m2 £ ¢ ¢ ¢ £ µmn (1 ¡ µ)1¡mn
L (µ) = µm1 (1 ¡ µ)
n¡m
= µm (1 ¡ µ)
where m is the number of people in your survey who favour the policy and n¡m
is the number of period in the survey against the policy. The log-likelihood is
then:
l (µ) = ln (L (µ)) = m ln (µ) + (n ¡ m) ln (1 ¡ µ) :
Using the chain rule we …nd that:
dl (µ)
dµ
³ ´
dl µ^
m n¡m
m n¡m
=
¡
=)
=0=
¡
^
µ
1¡µ
dµ
µ
1¡^
µ
m
^
=) µ = :
n
CHAPTER 2. UNIVARIATE CALCULUS
66
Thus if m = 525 say they are in favor and n = 2000 are interviewed then the
log-likelihood is:
l (µ) = 525 ln (µ) + (2000 ¡ 525) ln (1 ¡ µ)
which is plotted below:
-1200
-1400
-1600
-1800
-2000
-2200
-2400
-2600
-2800
-3000
-3200
-3400
0.1
0.2
0.3
0.4 0.5
theta
0.6
0.7
0.8
0.9
l (µ)
As illustrated in the graph, the maximum occurs at:
525
^
µ=
= 0:26
2000
which says that the best guess about µ; based on the poll, is that 26 percent of
the population are in favour of the policy.
µ = 0:26 we use:
To calculate a con…dence interval for ^
0
1¡1
=n(1¡^
µ)
³ ´ 1¡1
³
´
=n^
µ
0
}|
{
z
z}|{
2
^
^
B m
C
µ
1
¡
µ
d l µ^
n
¡
m
C
A =B
± = @¡
B 2 +³
´2 C =
@ ^
A
n
dµ 2
^
µ
1¡µ
so that a 95% con…dence interval for the unknown µ takes the form:
v ³
´
u
u^
µ
tµ 1 ¡ ^
^
µ § 1:96 £
:
n
Thus if m = 525 out of n = 2000 people polled are in favour of the policy, then:
±=
0:26 (1 ¡ 0:26)
2000
and the con…dence interval is:
r
0:26 £ (1 ¡ 0:26)
2000
or 0:26 § 0:019: Thus the poll would be accurate to within 1:9 percentage points
19 times out of 20 (or 95 times out of 100):
0:2625 § 1:96 £
CHAPTER 2. UNIVARIATE CALCULUS
2.5
67
Ordinal and Cardinal Properties
2.5.1
Class Grades
Consider a class of 4 students: John, Mary, Joe, and Sue. Suppose the instructor
gives an A to the student with the highest grade, a B to the next highest and
so on. The numerical and letter grades might then look like this:
John
75 B
Mary
50 D
Joe
65 C
Sue
:
85 A
Now suppose instead that the instructor adjusts (or bells) the grades by
applying a monotonic function g (x) (with g 0 (x) > 0) to each grade. This could
be g (x) = x ¡ 3:75; which would insure that the class average is 65 and which
satis…es g 0 (x) = 1 > 0; or something crazy like g (x) = x2 which yields:
John
752 = 5625 B
Mary
502 = 2500 D
Joe
652 = 4225 C
Sue
:
852 = 7225 A
Notice that the numerical grades change when g (x) is applied (for example
Joe’s grade changes from 65 to 4225) but that the letter grades do not change
(for example Joe received a C before the grades were adjusted and he receives
a C after the grades are adjusted.)
If instead the instructor used g (x) = x3 we would …nd that:
John
753 = 421875 B
Mary
503 = 125000 D
Joe
653 = 274625 C
Sue
853 = 614125 A
and all letters grades still remain unchanged.
Adjusting the grades with some monotonic g (x) is known as a monotonic
transformation. Letter grades are an example of what is known as an ordinal
property, one which does not change no matter what monotonic transformation
is applied. Cardinal properties, on the other hand, are properties which do
change when a monotonic transformation is applied. Thus the student’s numerical grades are cardinal properties.
It is important that we restrict ourselves to monotonic transformations; that
is we do require that g 0 (x) > 0: To see why suppose instead that the instructor
used: g (x) = x¡1 . Then we obtain:
John
75¡1 = 0:0133 C
Mary
50¡1 = 0:020 A
Joe
65¡1 = 0:0154 B
Sue
:
85¡1 = 0:0118D
Now it is Mary who receives an A and Sue who receives a D and the letter
grades do change here. The problem here is that:
g 0 (x) = ¡x¡2 = ¡
and so g (x) =
1
x
1
<0
x2
is not a monotonic transformation.
CHAPTER 2. UNIVARIATE CALCULUS
2.5.2
68
Ordinal and Cardinal Properties of Functions
Let us now turn our attention to ordinal and cardinal properties of a function: f (x).
We have:
De…nition 63 Monotonic Transformation: If g (x) is a monotonic (i.e.,
g0 (x) > 0 for all x) then applying g (x) to f (x) as g (f (x)) is called a monotonic
transformation of f (x) :
We then have:
De…nition 64 Ordinal Property: An ordinal property of a function y =
f (x) is one which does not change when any monotonic transformation g (x)
is applied to f (x) :
De…nition 65 Cardinal Property: A cardinal property of a function y =
f (x) is one which does change when at least one monotonic transformation
g (x) is applied to f (x) :
With class grades the student with the highest grade (Jane) and the student
with the lowest grade (Mary) are always the same no matter what monotonic
g (x) is applied. For functions Jane and Mary correspond to the global maximum
and minimum x¤ and so we have:
Theorem 66 The global maximum (global minimum) x¤ of f (x) is an
ordinal property. If f (x) = g (h (x)) with g 0 (x) > 0 then x¤ is a global
maximum (minimum) of f (x) if and only if x¤ is a global maximum (minimum)
of h (x) :
Example: Consider:
1
f (x) = x ¡ x2
2
for 0 < x < 2: You can easily show that f (x) has a global maximum at x¤ = 1:
Now suppose we apply the monotonic transformation: g (x) = x2 which leads
us to:
µ
¶2
1
h (x) = g (f (x)) = x ¡ x2 :
2
CHAPTER 2. UNIVARIATE CALCULUS
69
According to the theorem x¤ = 1 is also a global maximum of h (x) : This can
be seen from the plot of f (x) and h (x) below:
0.5
0.4
0.3
0.2
0.1
0
2.5.3
0.2
0.4
0.6
0.8
1
x
1.2
1.4
1.6
1.8
2
:
¢2
¡
f (x) = x ¡ 12 x2 ; h (x) = x ¡ 12 x2
Concavity and Convexity are Cardinal Properties
We have seen that a very important property of a function f (x) is whether it is
mountain-like or concave, or whether it is valley-like or convex. Now we might
ask, is global concavity or convexity an ordinal or a cardinal property of f (x)?
In other words if f (x) is globally concave (convex) does it follow that g (f (x))
is globally concave (convex) when g (x) is monotonic. Surprisingly the answer
is no!
Theorem 67 Concavity and Convexity are Cardinal properties. If
f (x) is concave (convex) then it does not follow that a monotonic transformation: h (x) =
g (f (x)) is concave (convex).
Proof. Here we use proof by counter-example. Suppose for x > 0 that
1
f (x) = x 2 : f (x) is globally concave since:
1 3
f 00 (x) = ¡ x¡ 2 < 0:
4
Now suppose we let g (x) = x4 so that g 0 (x) = 4x3 > 0 and so g (x) is monotonic.
Then:
h (x) = g (f (x))
³ 1 ´4
=
x2
= x2 :
But h (x) = x2 is globally convex (since h00 (x) = 2 > 0 ). Thus while f (x)
is concave and a monotonic transformation h (x) is not concave (in fact it is
convex).
CHAPTER 2. UNIVARIATE CALCULUS
70
More generally note that if h (x) = g (f (x)) then using the chain rule:
h0 (x)
=
g 0 (f (x)) f 0 (x)
+
?
+
z }| {z }| { z }| {
2
00
0
0
=) h (x) = g (f (x))(f (x)) + g (f (x))f 00 (x) :
00
We cannot show that h00 (x) and f 00 (x) have the same sign because we do not
know the sign of g 00 (f (x)) ; that is we only make assumptions about the …rst
derivative of g (x) and not about the second derivative. This then is the basic
reason why concavity and convexity are cardinal and not ordinal properties of
a function.
2.5.4
Quasi-Concavity and Quasi-Convexity
Since concavity and convexity are cardinal properties, let us de…ne a new kind of
concavity and convexity, called quasi-concavity and quasi-convexity which are
ordinal properties of a function:
De…nition 68 Quasi-Concavity: A function f (x) is quasi-concave if and
only if it is a monotonic transformation of a concave function; that is:
f (x) = g (h (x))
where g 0 (x) > 0 for all x and h (x) is globally concave.
De…nition 69 Quasi-Convexity: A function f (x) is quasi-convex if and only
if it is a monotonic transformation of a convex function; that is:
f (x) = g (h (x))
where g 0 (x) > 0 and h (x) is globally convex.
If f (x) is convex (concave) then it is also quasi-convex (quasi-concave) since
we can always let g (x) = x (with g0 (x) = 1 > 0) in which case f (x) = h (x) :
Thus:
Theorem 70 All convex functions are quasi-convex but not all quasi-convex
functions are convex.
Theorem 71 All concave functions are quasi-concave but not all quasi-concave
functions are concave.
If g (x) is monotonic in
f (x) = g (h (x))
(2.1)
then it follows that g (x) has an inverse function, g~ (x) If we apply g~ (x) to both
sides of (2:1) we free h (x) from inside g (x) and obtain:
h (x) = g~ (f (x)) :
Thus if f (x) is quasi-concave (quasi-convex) there exists a monotonic transformation of f (x) which makes it concave (convex). We therefore have:
CHAPTER 2. UNIVARIATE CALCULUS
71
Theorem 72 A function f (x) is quasi-concave (quasi-convex) if and only if
there exists a monotonic transformation g~ (x) such that:
h (x) = g~ (f (x))
is concave (convex).
Remark: There are thus two methods for showing that a function f (x) is
quasi-concave (quasi-convex). We can either 1) show that f (x) is a monotonic
transformation of a concave (convex) function or 2) show that a monotonic
transformation of f (x) is concave (convex):
Example: Consider the function:
f (x) =
1
:
1 + x2
This function is not globally concave since:
¢
¡
2 3x2 ¡ 1
00
f (x) =
(1 + x2 )3
so that f (x) is convex or f 00 (x) > 0 for jxj > p13 :
We can however show that f (x) is quasi-concave.
Using the …rst method we have:
f (x) = g (h (x))
1
1
with monotonic transformation: g (x) = 1¡x
(with g 0 (x) = (1¡x)
2 > 0) and
2
00
concave function and h (x) = ¡x (with h (x) = ¡2 < 0 ) since:
f (x) = g (h (x)) =
1
1
1
=
=
:
2
1 ¡ h (x)
1 ¡ (¡x )
1 + x2
Using the second method let g (x) = ¡ x1 be the monotonic transformation
(since g 0 (x) = x12 > 0 ) that we apply to f (x) : We then obtain:
h (x) = g (f (x)) = ¡
1
1
1+x2
¢
¡
= ¡ 1 + x2
¢
¡
where h (x) = ¡ 1 + x2 is globally concave since h00 (x) = ¡2 < 0:
2.5.5
New Su¢cient Conditions for a Global Maximum or
Minimum
Suppose we have a function f (x) that is quasi-concave (quasi-convex) so that
f (x) = g (h (x)) where h (x) is concave (convex). Suppose further that we have
CHAPTER 2. UNIVARIATE CALCULUS
72
a solution to the …rst-order conditions f 0 (x¤ ) = 0: From the chain rule this
implies that:
f 0 (x¤ ) = g0 (h (x¤ )) h0 (x¤ ) = 0 =) h0 (x¤ ) = 0:
Since x¤ is also a solution to the …rst-order conditions for h (x) and since h (x) is
concave (convex) it follows that x¤ is a global maximum (minimum) for h (x) :
Since a global maximum (minimum) is an ordinal property from Theorem 66,
it follows that x¤ is a global maximum for f (x) as well! Thus we have the
following su¢cient conditions for a global maximum (minimum):
Theorem 73 If f (x) is quasi-concave and x¤ satis…es the …rst-order conditions: f 0 (x¤ ) = 0 then x¤ is the unique global maximum of f (x) :
Theorem 74 If f (x) is quasi-convex and x¤ satis…es the …rst-order conditions:
f 0 (x¤ ) = 0 then x¤ is the unique global minimum of f (x) :
Remark: Since all concave (convex) functions are quasi-concave (quasi-convex)
but not all quasi-concave (quasi-convex) functions are concave (convex), these
su¢cient conditions for a global maximum (minimum) are more widely applicable than the earlier su¢cient conditions that relied on concavity (convexity).
Example: We have seen that the
1
1 + x2
f (x) =
is quasi-concave. From the …rst-order conditions we have:
f 0 (x¤ ) = ¡
2x¤
(1 + x¤2 )2
= 0 =) x¤ = 0:
Since f (x) is quasi-concave we conclude that x¤ = 0 is a global maximum. This
is illustrated in the plot below:
1
0.8
0.6
y
0.4
0.2
-4
-2
0
f (x) =
2
1
1+x2
x
4
:
CHAPTER 2. UNIVARIATE CALCULUS
2.6
73
Exponential Functions and Logarithms
Almost all of the functions we have considered so far involve terms of the
form: xa for some value of a: For some a > 0 consider reversing a and x to
obtain a new kind of function:
De…nition 75 Exponential Function: An exponential function takes the form:
f (x) = ax
where a > 0 is referred to as the base.
Example: If we reverse the 2 and the x in x2 we obtain:
f (x) = 2x
with f (3) = 23 = 8 and f (¡3) = 2¡3 =
illustrated below:
1
8.
The exponential f (x) = 2x is
30
25
20
y
15
10
5
-2
-1
0
1
2
x
3
4
5
:
x
f (x) = 2
Note from the graph that this function is non-negative, monotonic and convex.
In mathematics, as in economics, it turns out that there is a best base a for
exponentials ax . This is the number e denoted by:
1
1
1
+ + + ¢ ¢ ¢ or
1!µ 2! ¶3!
n
1
:
e ´ lim 1 +
n!1
n
e ´ 1+
One can show that the two de…nitions are equivalent and lead to:
e ¼ 2:718281828:
CHAPTER 2. UNIVARIATE CALCULUS
74
Remark: The second de…nition of e has an economic interpretation in terms of
compound interest. If you put $1 in the bank at r = 1 or 100% interest com1
pounded annually then after one year you will have (1 + r) = $2: Now suppose
th
interest is compounded every n1 of a year so that for n = 2; 3 and 365 (i.e.,
every 6 months, 4 months and daily interest) you would receive respectively:
³
r ´2
1+
2
³
r ´3
1+
3
³
r ´365
1+
365
µ
¶2
1
=
1+
= 2:25;
2
µ
¶3
1
=
1+
= 2:3704;
3
= 2: 7146:
Thus as interest is compounded more and more often the amount of money you
receive converges on e dollars, or approximately $2:72; so that e is amount of
interest you would receive from continuous compounding.
We have:
De…nition 76 The exponential function to the base e is denoted as: f (x) = ex
or f (x) = exp (x) is de…ned as:
³
x ´n
:
ex ´ lim 1 +
n!1
n
Remark 1: If you get confused with ex when using say the chain rule, try
rewriting the problem with ex replaced by: exp (x) and think of the letters: exp
taking the place of f as in f (x) :
Remark 2: It follows from the de…nition that er is the amount of money you
would obtain from investing $1 at interest rate r when interest is compounded
continuously. This is why in economics you will often see expressions like er for
discounting and compound interest. For example one dollar at 10% interest or
r = 0:1 compounded continuously will give you after 1 year:
e0:1 = 1:1052:
Mathematically the most important reason for choosing e as the base is that
the derivative of f (x) = ex is also ex so that:
Theorem 77 If f (x) = ex then f 0 (x) = ex :
Proof. (Informal) From the de…nition we have that en (x) ! ex as n ! 1
where:
³
x ´n
:
en (x) = 1 +
n
CHAPTER 2. UNIVARIATE CALCULUS
75
To …nd the derivative of ex di¤erentiate en (x) and then let n ! 1 so that:
³
x ´n¡1
en (x)
¢:
=¡
e0n (x) = 1 +
n
1 + nx
¡
¢
Now since limn!1 1 + nx = 1 we have:
dex
limn!1 en (x)
¡
¢ = ex :
= lim e0n (x) =
n!1
dx
limn!1 1 + nx
Since ex > 0 it follows that f 0 (x) = ex > 0 and so the function f (x) is
monotonic . Furthermore since f 00 (x) = ex > 0 it follows that ex is globally
convex. These properties are illustrated in the plot below:
20
15
y10
5
-2
-1
0
1
x
2
3
:
x
f (x) = e
Theorem 78 The function f (x) = ex has the following properties:
1. ex > 0 for all x
2. ex is de…ned for all x (it has an unrestricted domain)
3. ex is globally increasing (i.e., f 0 (x) = ex > 0 )
4. ex is globally convex (i.e., f 00 (x) = ex > 0 )
5. e0 = 1
6. ex ey = ex+y
7. ex ey = ex+y
8. e¡x =
1
ex :
Since f (x) = ex is monotonic, it follows that it has an inverse function,
which is the logarithm to the base e or ln (x) de…ned as:
CHAPTER 2. UNIVARIATE CALCULUS
76
De…nition 79 The function ln (x) is the inverse function of ex so that:
eln(x) = x; ln (ex ) = x:
The function ln (x) is plotted below:
2
1
0
2
4
x
6
8
10
-1
-2
y
-3
-4
:
ln (x)
Remark: Note from the graph that ln (x) is not de…ned for x · 0:
We can use the fact that ln (x) is the inverse function of ex to prove that:
Theorem 80 The derivative of ln (x) is
d ln(x)
dx
= x1 :
Proof. Since ln (x) is the inverse function of ex we have: eln(x) = x: Di¤erentiating both sides with respect to x and using the chain rule we have:
d ln(x)
e
dx
=
d
d
x =) e|ln(x)
{z } dx ln (x) = 1
dx
=x
d
=) x ln (x) = 1
dx
1
d
ln (x) = :
=)
dx
x
We also have:
Theorem 81 The function ln (x) has the following properties:
1. ln (xy) = ln (x) + ln (y)
2. ln (xy ) = y ln (x) :
3. ln (x) is de…ned only for x > 0 (it has a restricted domain)
CHAPTER 2. UNIVARIATE CALCULUS
77
4. ln (x) an take on both negative and positive values (it has an unrestricted
range)
5. ln (x) is globally increasing
6. ln (x) is globally concave
7. ln (1) = 0
¡ ¢
8. ln x1 = ¡ ln (x) :
Proof. The …rst follows from eln(x)+ln(y) = eln(x) eln(y)¡ = xy¢and then taking
y
the ln ( ) of both sides. The second follows from xy = eln(x) = ey ln(x) and
then taking the ln ( ) of both sides. The third follows from ex > 0 for all x and
so if x < 0 we would have the contradiction eln(x) = x < 0:2 The …fth follows
ln(x)
= x1 > 0 for x > 0. This sixth follows since d dx
= ¡ x12 < 0.
since: d ln(x)
2
dx
0
ln(0)
= 1: The …nal result follows from:
To¡ show ¢the seventh note
¡ ¢that e = e
ln x £ x1 = ln (x) + ln x1 = ln (1) = 0:
Remark: The function ln (x) gets used a lot in economics. For example in
applied econometrics, rather than working directly with a price P one usually
works with ln (P ) : On of the reasons for this is that ln (x) converts multiplication
into addition, and converts powers into multiplication.
Example 1: Suppose you had data on Q and P and wished to estimate a
constant elasticity of demand curve Q = AP ¯ . Since this is a non-linear function
you cannot directly apply linear regression to your data. However using the
properties of the ln ( ) function we obtain:
¢
¡ ¢
¡
Q = AP b =) ln (Q) = ln AP b = ln (A) + ln P ¯
=) q = ® + ¯p
where q = ln (Q) ; ® = ln (A) and p = ln (P ) : You now have a linear relationship
between q and p which can be estimated by linear regression. Furthermore the
coe¢cient on the regressor p is the elasticity of demand: ¯:
Example 2: Consider the function
y = f (x) = x3 e¡x
CHAPTER 2. UNIVARIATE CALCULUS
78
for x ¸ 0 which is plotted below:
1.2
1
0.8
y
0.6
0.4
0.2
0
2
4
6
x
8
10
:
3 ¡x
y=x e
To …nd the maximum of this function take the …rst-order conditions (using the
product and chain rules) to obtain:
f 0 (x)
= 3x2 e¡x ¡ x3 e¡x = x2 e¡x (3 ¡ x)
¤
=) f 0 (x¤ ) = x¤2 e¡x (3 ¡ x¤ ) = 0
=) (3 ¡ x¤ ) = 0
¤
which has a solution x¤ = 3: Note that x¤2 > 0 since x > 0 and e¡x > 0 since
ex > 0 for all x:
Here we cannot show that x¤ = 3 is a global maximum by showing global
concavity since f (x) is not globally concave. This follows since:
¡
¢
f 00 (x) = xe¡x x2 ¡ 6x + 6
³
³
³
p ´´
p ´´ ³
x¡ 3+ 3
= xe¡x x ¡ 3 ¡ 3
p ¢
p ¢
¡
¡
and so f (x) is concave in the interval 3 ¡ 3 < x < 3 + 3 and convex
outside this interval.
We can show however that f (x) is quasi-concave since:
f (x) = g (h (x))
where:
g (x) = ex
is monotonic and:
h (x) = 3 ln (x) ¡ x
is globally concave since:
h00 (x) = ¡
3
< 0:
x2
CHAPTER 2. UNIVARIATE CALCULUS
79
It follows that: x¤ = 3 is a global maximum for both h (x) and f (x) :
Example 3: The standard normal distribution, easily the most important
probability distribution, has a probability density given by:
1 2
1
p (x) = p e¡ 2 x
2¼
which is plotted below:
0.4
0.3
0.2
0.1
-4
-2
p (x) =
0
2
x
4
:
1 2
p1 e¡ 2 x
2¼
Note from the graph that p (x) appears to be symmetric around 0: This is
in fact the case since: p (¡x) = p (x) as:
2
1
1 2
1
1
p (¡x) = p e¡ 2 (¡x) = p e¡ 2 x = p (x) :
2¼
2¼
The mode of p (x) (the maximum value of p (x)) is at x¤ = 0 . To show this
we use the chain rule to obtain the …rst-order conditions as:
p0 (x)
=
1
1 2
p e¡ 2 x £ ¡x
2¼
1
1 ¤2
=) p0 (x¤ ) = 0 =) p e¡ 2 x £ ¡x¤ = 0
2¼
=) x¤ = 0:
We might try to show that x¤ = 0 is a global maximum by showing that
p (x) is globally concave. We have however that:
1
1
1 2
1 2
p00 (x) = ¡ p e¡ 2 x + p e¡ 2 x £ x2
2¼
2¼
¢
1
1 2 ¡
= p e¡ 2 x x2 ¡ 1
2¼
from which it follows that p (x) is concave for ¡1 < x < 1 but convex for x > 1
or x < ¡1: It follows then that p (x) is not globally concave.
CHAPTER 2. UNIVARIATE CALCULUS
80
We can however show that p (x) is quasi-concave since
µ µ
¶
¶
1
1 2
p (x) = exp ln p
¡ x
2
2¼
with monotonic function: g (x) = exp (x) and
µ
¶
1
1
h (x) = ln p
¡ x2
2
2¼
h00 (x) = ¡1 < 0
so that h (x) is globally concave. It follow then that p (x) is quasi-concave so
that x¤ = 0 is a global maximum.
It will sometimes occur that you are confronted with exponential functions
which do not have e as the base, and less frequently logarithms which are not
to the base e; such as log10 (x) : Given such problems the best strategy is to
convert the problem from base a to base e using:
Theorem 82 The functions ax or loga (x) can be converted to base e using:
1. ax = eln(a)x
2. loga (x) = ln (x) = ln (a) :
¢x
¡
Proof. Since a = eln(x) we have: ax = eln(a) = eln(a)x : To derive the
¢loga (x)
¡
= eln(a) loga (x) and take the ln ( )
second result use: x = aloga (x) = eln(a)
of both sides.
Example 1: Given the function
y = 2x
we can convert to base e using
It then follows that:
´x
³
2x = eln(2) = eln(2)x :
d x
2 = ln (2) eln(2)x = ln (2) 2x
dx
using the chain rule.
Example 2: If you recall we never de…ned xa for a non-integer a: In fact it is
de…ned using ex and ln (x) as:
xa ´ ea ln(x) :
Thus the reason xa is not de…ned for x < 0 is that ln (x) is not de…ned. Using
this de…nition we can prove that:
CHAPTER 2. UNIVARIATE CALCULUS
81
Theorem 83 For x > 0 and for any a we have:
dxa
= axa¡1 :
dx
Proof. Using the de…nition of xa and the chain rule we …nd that:
d a
x
dx
d a ln(x)
e
dx
1
= a ea ln(x)
x
= ax¡1 xa
= axa¡1 :
=
Example 3: For the function:
f (x) = xx
we now have x in the base and the exponent! We have no direct rule for
calculating derivatives of this function. We can however change from base x to
base e as:
´x
³
f (x) = xx = eln(x) = ex ln(x) :
Therefore using the chain and product rules yields:
f 0 (x)
=
=)
=)
=)
=)
(1 + ln (x)) ex ln(x)
¤
¤
f 0 (x¤ ) = 0 = (1 + ln (x¤ )) ex ln(x )
¤
¤
1 + ln (x¤ ) = 0 since ex ln(x ) > 0
ln (x¤ ) = ¡1
x¤ = e¡1 = 0:36788:
Furthermore f (x) is globally convex since:
µ
¶
1
f 00 (x) = ex ln x (1 + ln (x))2 +
>0
x
CHAPTER 2. UNIVARIATE CALCULUS
82
and hence x¤ = e¡1 is a global minimum. This is illustrated in the plot below:
1
0.95
0.9
0.85
0.8
0.75
0.7
0
0.2
0.4
x
0.6
0.8
1
f (x) = xx
Example 4: On your calculator you will see the log10 (x) which is the logarithm
to the base 10 instead of base e: To …nd its derivative we use:
log10 (x) =
1
ln (x)
ln (10)
so that:
1
d log10 (x)
=
:
dx
ln (10) x
2.6.1
Exponential Growth and the Rule of 72
Eighty percent of rules of thumb only apply 20 percent of the time
-David Gunn
Suppose we replace x by t; and think of t as time, and imagine that y is
some variable (population, GNP etc.) that grows with time so that y = f (t) :
Theorem 84 The growth rate of y per unit period of t (e.g. growth per year)
is:
f 0 (t)
f (t + ¢t) ¡ f (t)
=
:
¢t!0
f (t) ¢t
f (t)
lim
Many economic variables appear to growth at approximately the same rate
over time. For example since the industrial revolution many advanced economies
have grown at an average of around 2% a year.
There is a functional form which has the property that the growth rate
remains constant over time. We have:
CHAPTER 2. UNIVARIATE CALCULUS
83
Theorem 85 The function f (t) = Ae¹t grows at a constant rate ¹ for all t:
Proof. Using the chain rule and the properties of ex we have:
¹Ae¹t
f 0 (t)
=
= ¹:
f (t)
Ae¹t
Example: Thus if t is measured in years and ¹ = 0:03; then y = Ae0:03t grows
at 3% every year.
One way of understanding the implications of di¤erent growth rates is the
time it takes for y to double. Let this be ¢t which then satis…es: f (t + ¢t) =
2f (t) or
Ae¹(t+¢t) = 2Ae¹t :
Solving for ¢t we …nd that
ln (2)
:69315
72
=
¼
¹
¹
¹ £ 100%
which gives the rule of 72; where 72 is chosen because it is a nice number with
lots of divisors that is not too far away from 69.
Thus if GN P grows at 2% a year, it will double approximately every 72
2 or
36 years. On the other hand if GN P grows at 4% a year it will double every 72
4
or 18 years.
This can make a huge di¤erence since the economy that doubles every 18
years will be 4 times as large after 36 years while the economy which grows at
2% will only be twice as large. Thus imagine two countries with identical GN P
at t = 0 (say in 1945) but where one grows at 2%=year and the other grows at
4%=year:
¢t =
9
8
7
6
5
4
3
2
1
0
10
20
t
30
40
50
:
Plot of e0:02t and e0:04t :
Small di¤erences in growth rates make a huge di¤erence! As the above graph
illustrates, after 55 years (the time from 1945 to today 2000) the country that
grows at 4%=year will have an economy three times as large as the economy
that grows at 2%=year:
CHAPTER 2. UNIVARIATE CALCULUS
2.7
84
Taylor Series
Although you may not be aware of it, calculus is really a method for approximating functions with polynomials. For example a derivative corresponds to the
slope of a tangent line, this tangent line being a …rst degree polynomial. Second derivatives basically involve approximating a function with a second degree
polynomial or quadratic.
The key concept which links these polynomial approximations with derivatives is the Taylor Series.
Theorem 86 Taylor Series: A function f (x) can be approximated at x = x0
by an nth order polynomial f~ (x), called a Taylor Series, given by:
f 2 (x0 )
f n (x0 )
(x ¡ x0 )2 + ¢ ¢ ¢ +
(x ¡ x0 )n
f~ (x) = f (x0 ) + f 1 (x0 ) (x ¡ x0 ) +
2!
n!
where f n (x0 ) is the nth derivative of f (x) evaluated at x = x0 :
Remark: The approximation of f~ (x) to f (x) gets better the closer x is to x0 :
When x = x0 the approximation becomes exact, that is f~ (x0 ) = f (x0 ) since
n
the terms (x ¡ x0 ) become 0:
Example 1: The …rst-order Taylor Series:
f~ (x) = f (x0 ) + f 0 (x0 ) (x ¡ x0 )
approximates an arbitrary function f (x) by a line.
Consider:
f (x) = x2 + 9
and let us construct a …rst-order Taylor series around x0 = 1: We need to
calculate two number f (x0 ) and f 0 (x0 ) as:
f (x) = x2 + 9 =) f (x0 ) = f (1) = 12 + 9 = 10
f 0 (x) = 2x =) f 0 (x0 ) = f 0 (1) = 2 £ 1 = 2
so that:
f~ (x) = f (1) + f 0 (1) (x ¡ 1)
= 10 + 2(x ¡ 1)
= 8 + 2x:
To see how the approximation works, consider an x close to x0 = 1; say
x = 1:2: Then we have:
f (1:2) = (1:2)2 + 9 = 10:44
CHAPTER 2. UNIVARIATE CALCULUS
85
while the Taylor series approximation gives:
f~ (1:2) ¼ 10 + 2(1:2 ¡ 1) = 10:4:
On the other hand if x is far from x0 = 1; say x = 10 then:
f (10) = 102 + 9 = 109
f~ (10) = 10 + 2 (10 ¡ 1) = 29
and f~ (x) does a poor job of approximating f (x) :
A plot of f (x) = x2 + 9 and its straight-line Taylor series approximation
f~ (x) = 10 + 2(x ¡ 1) is given below:
30
25
20
15
10
5
-4
-2
0
2
x
4
:
f (x) and f~ (x)
Example 2: The second-order Taylor Series is given by:
f 00 (x0 ) (x ¡ x0 ) 2
f~ (x) = f (x0 ) + f 0 (x0 ) (x ¡ x0 ) +
2
which approximates the function f (x) at x0 by a quadratic.
If
f (x) = x3 + 9
then in order to calculate a second-order Taylor series around x0 = 1 we need
to calculate three numbers: f (x0 ) ; f 0 (x0 ) and f 00 (x0 ) which are given by:
f (x0 ) = 13 + 9 = 10
f 0 (x) = 3x2 =) f 0 (x0 ) = f 0 (1) = 3 £ 12 = 3
f 00 (x) = 6x =) f 00 (x0 ) = f 00 (1) = 6 £ 1 = 6
and so the second-order Taylor series is:
f 00 (1)
(x ¡ 1)2
f~ (x) = f (1) + f 0 (1) (x ¡ 1) +
2
6
= 10 + 3(x ¡ 1) + (x ¡ 1)2
2
3x2 ¡ 3x + 10:
CHAPTER 2. UNIVARIATE CALCULUS
86
To see how the approximation does, let us …rst pick an x close to x0 = 1;
say x = 1:2: We then have:
f (1:2) = (1:2)3 + 9 = 10:728
while the Taylor Series approximation gives
6
f~ (1:2) = 10 + 3(1:2 ¡ 1) + (1:2 ¡ 1)2 = 10:72:
2
On the other hand if we choose an x far from x0 = 1; say x = 7; we obtain:
f (7) = (7)3 + 9 = 352
6
f~ (1:2) = 10 + 3(7 ¡ 1) + (7 ¡ 1)2 = 136
2
and so we obtain a poor approximation.
A plot of the cubic x3 + 9 and its quadratic second-order Taylor series approximation around x0 = 1 is given below.
70
60
50
40
30
20
10
-2
-1
0
1
2
x
3
4
f (x) and f~ (x)
2.7.1
The Error of the Taylor Series Approximation
A natural question is to ask how well does f~ (x) approximate f (x)? The French
mathematician Lagrange showed that the error of an nth order Taylor series
¹; where x
¹ lies
approximation is equal to the n + 1th term with x0 replaced by x
between x and x0 : Thus:
Theorem 87 The error of the nth order Taylor series approximation is given
by:
f n+1 (¹
x) (x ¡ xo )n+1
(n + 1)!
CHAPTER 2. UNIVARIATE CALCULUS
87
so that:
f (x) = f (xo ) + f 0 (xo ) (x ¡ xo ) + ¢ ¢ ¢ +
f n (x0 )
f n+1 (¹
x) (x ¡ xo )n+1
(x ¡ xo )n +
n!
(n + 1)!
where x
¹ lies between x0 and x:
Example: For the …rst-order Taylor we have:
f (x) = f (xo ) + f 0 (xo ) (x ¡ xo ) +
¹ lies between x0 and x:
where x
f 00 (¹
x)
(x ¡ xo )2
2!
CHAPTER 2. UNIVARIATE CALCULUS
88
To see how this can be used, let us now prove that a concave (convex)
function has a unique global maximum (minimum) at x¤ :
Proof. Suppose that x¤ solves the …rst-order conditions: f 0 (x¤ ) = 0 and
00
f (x) < 0 for all x: A …rst-order Taylor series (with the error term) of f (x)
around x0 = x¤ takes the form:
=0
z }| {
f 00 (¹
x)
(x ¡ x¤ )2
f (x) = f (x¤ ) + f 0 (x¤ ) (x ¡ x¤ ) +
2!
f 00 (¹
x)
(x ¡ x¤ )2 :
= f (x¤ ) +
2!
Now since f (x) is concave (convex) it follows that: f 00 (x) < 0 for all x (f 00 (x) >
x) < 0 (f 00 (¹
x) > 0): If x 6= x¤ it follows that
0 for all x) it follows that f 00 (¹
¤ 2
(x ¡ x ) > 0 and hence we have:
¡
z
}|
{
f 00 (¹
x)
¤
¤ 2
(x ¡ x ) < f (x¤ ) :
f (x) = f (x ) +
2!
This says that for any x 6= x¤ that f (x) < f (x¤ ) so that x¤ is a global maximum
(minimum).
CHAPTER 2. UNIVARIATE CALCULUS
2.7.2
89
The Taylor Series for ex and ln (1 + x)
Consider calculating a Taylor series for ex around x0 = 0. The nth term is of
the form:
xn
f n (x0 )
(x ¡ x0 )n =
n!
n!
since if f (x) = ex then f n (x) = ex and hence f n (0) = e0 = 1: It turns out that
by letting n ! 1 we obtain an exact result, which is often used to de…ne ex as
follows:
Theorem 88 The in…nite order Taylor series for ex around x0 = 0 is exact for
all x and is given by:
ex = 1 + x +
x2 x3 x4
+
+
+¢¢¢.
2!
3!
4!
As an exercise take the derivative of both sides and show that f 0 (x) = ex :
Another important result is the Taylor series for ln (1 + x):
Theorem 89 The Taylor series of ln (1 + x) around x0 = 0:
ln (1 + x) = x ¡
x2 x3 x4
+
¡
+¢¢¢ .
2
3
4
is exact for jxj < 1:
Example: From the …rst-order Taylor for ln (1 + x) we have:
ln (1 + x) ¼ x
for x small. For example:
ln (1 + 0:1) = :09531 ¼ 0:1:
The Taylor series of ln (1 + x) can be used to de…ne an alternative measure
of percentage that is very useful in economics. Suppose you want to calculate
the percentage change from x1 to x2 : The normal way of doing this would be
as:
x2 ¡ x1
£ 100%:
x1
Thus if x2 = 110 and x1 = 100 the percentage change so de…ned is 10%:
All de…nitions of percentage su¤er from the fact that the choice of the base
is to some extent arbitrary. Thus instead of using 100 as the base we could
equally have well used 110; or indeed any number between 100 and 110 such as
CHAPTER 2. UNIVARIATE CALCULUS
90
the midpoint 105. If for example we had used 110 as the base we would have as
our de…nition of percentage:
x2 ¡ x1
£ 100%
x2
which would lead a percentage change of 9:0909%:
Now consider calculating the de…nition of percentage as:
(ln (x2 ) ¡ ln (x1 )) £ 100%
or equivalently:
ln
µ
x2
x1
¶
£ 100%:
Using this de…nition we would get 9:531%, which is intermediate between the
two other de…nitions of percentage.
To show why this third de…nition is sensible, use a …rst-order Taylor series
approximation of ln (1 + x) ¼ x noting that:
µ ¶
µ
¶
x2
x2 ¡ x1
x2 ¡ x1
:
ln
= ln 1 +
¼
x1
x1
x1
2.7.3
L’Hôpital’s Rule
Consider the following problem. Suppose two functions f (x) and g (x) have the
property that f (x0 ) = 0 and g (x0 ) = 0: We wish to …nd out what happens to
f (x)
0
g(x) as x ! x0 : In general the ratio 0 is indeterminate so it is not clear what
the limit is. Consider using a …rst-order Taylor series for f (x) and g (x) around
x0 ; an approximation which will get better as: x ! x0 :
Since f (x0 ) = g (x0 ) = 0 we have:
f (x0 ) + f 0 (x0 ) (x ¡ x0 )
f 0 (x0 ) (x ¡ x0 )
f 0 (x0 )
f (x)
¼
= 0
= 0
:
0
g (x)
g (x0 ) + g (x0 ) (x ¡ x0 )
g (x0 ) (x ¡ x0 )
g (x0 )
This yields L’Hôpital’s rule:
Theorem 90 L’Hôpital’s Rule I If f (x0 ) = g (x0 ) = 0 then:
lim
x!x0
f (x)
f 0 (x)
= lim 0
:
g (x) x!x0 g (x)
Another version of L’Hôpital’s rule is:
Theorem 91 L’Hôpital’s Rule II If f (x0 ) = g (x0 ) = 1 then:
lim
x!x0
f (x)
f 0 (x)
= lim 0
:
g (x) x!x0 g (x)
CHAPTER 2. UNIVARIATE CALCULUS
91
Remark: L’Hôpital’s rule does not work if either f (x) or g (x) does not approach 0 or 1:
Example: Suppose f (x) = x2 ¡ 1 and g (x) = x ¡ 1 so that: f (1) = g (1) = 0:
Since f 0 (x) = 2x and g0 (x) = 1 we have:
x2 ¡ 1
2x
= lim
= 2:
x!1 x ¡ 1
x!1 1
lim
2.7.4
Newton’s Method
Consider the problem of …nding a root of a function f (x) ; that is we wish to
calculate an x+ which satis…es
¡ ¢
f x+ = 0:
This is a very common problem in economics and econometrics. For example
suppose we wish to minimize or maximize a function g (x) : Then we would want
to calculate the root of g 0 (x) ; that is the x+ = x¤ which satis…es the …rst-order
conditions:
g 0 (x¤ ) = 0:
Solving for roots is easy to do for linear functions, quadratics and certain
other special functions. Generally speaking functions for which a formula exists
for calculating a root are the exception.
Although there exists no formula, there do exist numerical methods for calculating x+ . These numerical methods, combined with the use of computers,
make the solving of these sorts of problems routine today.
The basic method was invented by Newton and involves approximating f (x)
with a …rst-order Taylor series. The …rst step is to make an educated guess what
the root x+ might be. Call this guess x0 and approximate f (x) around x0 using
a …rst-order Taylor series so that:
f (x) ¼ f~ (x) = f (x0 ) + f 0 (x0 ) (x ¡ x0 ) :
Although we cannot solve f (x+ ) = 0; it is easy to solve f~ (x) = 0 since f~ (x) is
just a linear function. Let x1 be the value of x that solves f~ (x) = 0 so that:
f~ (x1 )
0 =) f (x0 ) + f 0 (x0 ) (x1 ¡ x0 ) = 0
f (x0 )
=) x1 = x0 ¡ 0
f (x0 )
=
While x1 is not a root of f (x) it will generally be closer to x+ than x0 : To get
an even better estimate of x we apply the same method again but now using x1
that is we use:
f (x) ¼ f~ (x) = f (x1 ) + f 0 (x1 ) (x ¡ x1 )
CHAPTER 2. UNIVARIATE CALCULUS
92
so that solving for f~ (x2 ) = 0 we obtain:
x2 = x1 ¡
f (x1 )
:
f 0 (x1 )
The new guess x2 will generally be closer to x than x1 :
This procedure can be repeated again and again using:
xn = xn¡1 ¡
f (xn¡1 )
f 0 (xn¡1 )
until xn close enough to x : This is Newton’s method which is represented
graphically below.
CHAPTER 2. UNIVARIATE CALCULUS
93
Example: Suppose you wish to …nd a root of:
f (x) = x7 ¡ 3x + 1
so that x satis…es
x7 ¡ 3x + 1 = 0:
To apply Newton’s method we will need :
f 0 (x) = 7x6 ¡ 3:
Let x
0
= 1 be our initial guess. Then since
f (1) = 17 ¡ 3 £ 1 + 1 = ¡1; f 0 (1) = 7 £ 16 ¡ 3 = 4
we have:
f~ (x) = ¡1 + 4 (x ¡ 1)
so that:
f~ (x1 )
= 0 =) ¡1 + 4 (x1 ¡ 1) = 0
=) x1 = 1:25:
Thus x1 = 1:25 is our new guess of what x is. Repeating the procedure with
x1 = 1:25 we …nd that
f (1:25) = 2:018 4; f 0 (1:25) = 23: 703
so that:
f~ (x) = 2: 018 4 + 23:703 (x ¡ 1:25)
f~ (x2 ) = 0 =) x2 = 1:1648:
Repeating this again with x
2
= 1:1648 we …nd that:
f~ (x) = f (1:1648) + f 0 (1:1648) (x ¡ 1:1648)
f~ (x3 ) = 0 =) x3 = 1:1362:
To obtain more precision we iterate one more time with x3 = 1:1362 so that:
f~ (x) = 0:033453 + 12:06 (x ¡ 1:1362)
~
f (x4 ) = 0 =) x4 = 1:1334:
The actual root, to 5 decimal places is x = 1:1332 so our solution x 4 = 1:1334
is o¤ by 0:0002 . For many purposes this is close enough although further
accuracy can be obtained by iterating further.
CHAPTER 2. UNIVARIATE CALCULUS
94
You might want to try and …nd the other two real roots x = ¡1:2492 and
x = 0:33349 which can be found by using other starting values (try the starting
values x¤0 = 1:5 and x¤0 = 0 respectively). You can see all three roots graphically
below:
10
y
5
-1.5
-1
-0.5
0
0.5
x
1
1.5
-5
-10
:
f (x) = x7 ¡ 3x + 1
2.8
Technical Issues
2.8.1
Continuity and Di¤erentiability
Nothing is accomplished all at once, and it is one of my great maxims
... that nature makes no leaps.... This law of continuity declares that
we pass from the small to the great - and the reverse - through the
medium, in degree as well as in parts. -Leibniz
Natura non facit saltum (nature does not make a jump).
-On the title page of Alfred Marshall’s 1890 Principles of Economics
Not all functions are continuous, and not all functions have a derivative.
These functions can often be ignored for the purposes of intermediate economics
as mathematical freaks. Nevertheless there is some virtue in having in the back
of your mind the idea that sometimes continuity and di¤erentiability are issues.
Example 1: The function:
f (x) =
½
x2 if x ¸ 2
¡5x if x < 2
CHAPTER 2. UNIVARIATE CALCULUS
95
plotted below:
25
20
15
y
10
5
-4
0
-2
2
4
x
-5
-10
A Discontinuous Function
is not continuous at x = 2 and so does not have a derivative or a slope at x = 1:
Thus at x = 2 we really cannot say if f (x) is increasing or decreasing.
Example 2: In order to have a derivative a function must be continuous, but
there are continuous functions that do not have derivatives. For example the
function:
f (x) = ¡ jx ¡ 1j1=2
which is plotted below:
-0.5
-1
y
-1.5
-2
-2.5
-4
-2
0
2
4
x
6
8
f (x) = ¡ jx ¡ 1j1=2
is continuous but does not have a derivative at x = 1. In particular f 0 (x) =
¡ p1
for x < 1 and f 0 (x) = p 1 for x > 1 and f 0 (x) ! §1: as x ! 1
2
jx¡1j
2
jx¡1j
from either the left or the right. The problem is that the function has a kink at
x = 1; and so its derivative does not exist at this point.
CHAPTER 2. UNIVARIATE CALCULUS
2.8.2
96
Corner Solutions
In a more advanced treatment of the …rst-order conditions we would have stated
that if x¤ is a maximum and x¤ does not lie on the boundary of the
domain of f (x) then: f 0 (x¤ ) = 0: If x¤ lies on the boundary of the domain
we have a corner solution and it is not necessarily the case that f 0 (x¤ ) = 0:
To see why this might matter consider the fact that in economics prices and
quantities are positive we often require that the domain of f (x) have x ¸ 0:
It follows then that 0 is on the boundary of the such a domain. Sometimes it
occurs where x¤ = 0 as when a …rm decides not to hire any labour or when a
household does not buy any of the good. In this case the …rst-order conditions
no longer require f 0 (x¤ ) = 0:
Example: Consider the problem of maximizing
f (x) = 10 ¡ (x + 1)2
where we restrict the domain of f (x) to be x ¸ 0: If we use the …rst-order
conditions we …nd that:
f 0 (x) = ¡2 (x + 1) =) f 0 (x¤ ) = 0 =) x¤ = ¡1:
Note however that x¤ = ¡1 is not in the domain of f (x) since we require: x ¸ 0:
So what value of f (x) maximizes f (x) for x ¸ 0? Consider the plot of f (x)
below:
9
8
7
y
6
5
4
0
0.2
0.4
0.6
0.8
x
1
1.2
1.4
:
f (x) = 10 ¡ (x + 1)2
Note that f (x) is maximized at x¤ = 0; that is we have a maximum on the
boundary of the domain of f (x) : From the graph or from:
f 0 (0) = ¡2 (0 + 1) = ¡2 < 0:
we see that at x¤ = 0 that f 0 (x) < 0.
A more systematic treatment of the …rst-order conditions at corner solutions
leads to the Kuhn-Tucker conditions which are better left for more advanced
study.
CHAPTER 2. UNIVARIATE CALCULUS
2.8.3
97
Advanced Concavity and Convexity
If you go on and do more advanced work you will …nd that our de…nitions of
concavity and convexity are not entirely adequate for all purposes.
Consider for example the function:
f (x) = x4
which is plotted below:
1
0.8
0.6
y
0.4
0.2
-1
0
-0.5
0.5
x
1
:
4
y=x
From the plot f (x) certainly looks valley-like for all x and hence we would like
to say that x4 is globally convex. If we check the second derivative we have
f 00 (x) = 12x2 ¸ 0
but for x = 0 we have f 00 (0) = 0: Thus according to our de…nition of convexity
x4 is not convex since we require that f 00 (x) > 0 for all x:
Another problem arises with the absolute value function
y = jxj
which is plotted below:
5
4
3
y
2
1
-4
-2
0
y = jxj
2
x
4
:
CHAPTER 2. UNIVARIATE CALCULUS
98
Again this function clearly looks valley-like and hence we would like to say that
it is convex. However f 00 (x) = 0 for x 6= 0 and f 00 (0) is not de…ned.
These problems can be dealt with by using more sophisticated de…nitions of
concavity and convexity. In particular we have:
De…nition 92 A function f (x) is convex if and only if it is the case for all x1
and x2 in the domain of f (x) that for all 0 · ¸ · 1:
x3 = ¸x1 + (1 ¡ ¸) x2
is in the domain of f (x) and
f (x3 ) · ¸f (x1 ) + (1 ¡ ¸) f (x2 ) :
De…nition 93 A function f (x) is concave if and only if it is the case for all
x1 and x2 in the domain of f (x) that for all 0 · ¸ · 1:
x3 = ¸x1 + (1 ¡ ¸) x2
is in the domain of f (x) and
f (x3 ) ¸ ¸f (x1 ) + (1 ¡ ¸) f (x2 ) :
Remark 1: The de…nition says that if you draw a line between any two points
on the graph of a convex function f (x) then the line falls everywhere above the
graph of f (x) ; while if you draw a line between two points of a concave function
the line falls below the graph. This is illustrated in the diagram below:
CHAPTER 2. UNIVARIATE CALCULUS
99
Remark 2: Note that both f (x) = jxj and f (x) = x4 are convex according to
the more advanced de…nition.
Finally in more advanced work one makes the distinction between strictly
convex (concave) functions which have no linear segments, and convex (concave)
functions which are allowed to have linear segments. Thus f (x) = jxj is convex
but not strictly convex because it is linear to the right and left of 0 while
f (x) = x2 ; which has no linear segments, is strictly convex. In particular:
De…nition 94 A function f (x) is strictly convex if and only if it is the case
for all x1 and x2 in the domain of f (x) such x1 6= x2 that for all 0 < ¸ < 1:
x3 = ¸x1 + (1 ¡ ¸) x2
is in the domain of f (x) and
f (x3 ) < ¸f (x1 ) + (1 ¡ ¸) f (x2 ) :
CHAPTER 2. UNIVARIATE CALCULUS
100
De…nition 95 A function f (x) is strictly concave if and only if it is the case
for all x1 and x2 in the domain of f (x) such x1 6= x2 that for all 0 < ¸ < 1 :
x3 = ¸x1 + (1 ¡ ¸) x2
is in the domain of f (x) and
f (x3 ) < ¸f (x1 ) + (1 ¡ ¸) f (x2 ) :
Note the more severe requirements in these de…nitions that x1 6= x2 and
that ¸ = 0 and ¸ = 1 are not allowed. This means that all strictly convex
(concave) functions are also convex (concave) but a convex (concave) function
is not necessarily strictly convex (concave).
It also turns out that our de…nitions of quasi-concavity and quasi-convexity
are ‡awed. For completeness you might also want to see the advanced de…nitions:
De…nition 96 A function f (x) is quasi-concave if and only if for all x1 ; x2 in
the domain of f (x) that for 0 · ¸ · 1:
x3 = ¸x1 + (1 ¡ ¸) x2
is also in the domain of f (x) and:
f (x3 ) ¸ min [f (x1 ) ; f (x2 )] :
De…nition 97 A function f (x) is quasi-concave if and only if for all x1 ; x2 in
the domain of f (x) that for 0 · ¸ · 1:
x3 = ¸x1 + (1 ¡ ¸) x2
is also in the domain of f (x) and:
f (x3 ) · max [f (x1 ) ; f (x2 )] :
Chapter 3
Matrix Algebra
Such is the advantage of a well constructed language that its simpli…ed notation often becomes the source of profound theories. -PierreSimon Laplace
A matrix is basically just a table of
given by:
2
50
A = 4 35
65
numbers. For example the matrix A
3
75
65 5
85
could be the grades of three students over two exams. We implicitly work with
matrices all the time when we work with data.
Matrix algebra is the art of manipulating matrices in a manner similar to
manipulating ordinary numbers in ordinary algebra. Thus we will learn to add
subtract, multiply and divide matrices. It is even possible to calculate eA or
ln (A) where A is a matrix.
In many ways matrix algebra is nothing more than a convenient notation.
It is always possible in principle to avoid matrix algebra by working directly
with the ordinary numbers inside the matrices. This is however rather like
walking from Los Angeles to New York rather than ‡ying! There are for example
derivations in econometrics that might require …ve pages without matrix algebra
but which can be performed in only a few lines using matrix algebra. Matrix
algebra is a profound notation, one that allows you to see things that you would
never see otherwise. Along with calculus, it is one of the two fundamental
mathematical skills that a student of economics must acquire.
The cost of the power of matrix algebra is danger! Many of your instincts
from ordinary algebra will lead you astray when you work with matrices. The
classic example of this is that for matrices A £ B and B £ A are no longer the
101
CHAPTER 3. MATRIX ALGEBRA
102
same! For this reason you need to be careful in the beginning (and even later
on!) until you have developed reliable instincts.
We begin by de…ning a matrix:
De…nition 98 Matrix: An m £ n matrix A with m rows and n columns takes
the form:
2
3
a11 a12 ¢ ¢ ¢ a1n
6 a21 a22 ¢ ¢ ¢ a2n 7
6
7
A = [aij ] = 6 .
..
.. 7
..
4 ..
.
.
. 5
am1 am2 ¢ ¢ ¢ amn
where aij is the element in the ith row and the j th column of A:
Example: The 3 £ 2 matrix:
2
a11
A = 4 a21
a31
3 2
3
5 4
a12
a22 5 = 4 3 1 5
a32
6 2
has a12 = 4; a21 = 3 and a32 = 2:
The case of square matrices and their diagonal elements will be of particular
importance:
De…nition 99 Square Matrix An m£n matrix A is a square matrix if m = n:
De…nition 100 The Diagonal of a Square Matrix: Given an n £ n square
matrix A = [aij ] the diagonal elements are those elements aij for which i = j:
Example: A 2 £ 2 square matrix is:
·
¸
5 4
:
3 1
The diagonal elements are a11 = 5 and a22 = 1.
Remark: Note that the diagonal goes from the top left-hand corner to the
bottom right-hand corner as:
3
2
..
.
5:
4
..
.
For our purposes there is nothing particularly interesting about the ‘other diagonal’, the one that goes from the top right-hand corner to the bottom left-hand
corner.
Also of special importance in matrix algebra are vectors, which come in two
‡avors, and scalars:
CHAPTER 3. MATRIX ALGEBRA
103
De…nition 101 Row Vector: A row vector x = [xi ] is a 1 £ n matrix.
De…nition 102 Column Vector: A column vector x = [xi ] is a n £ 1 matrix.
De…nition 103 Scalar: A scalar is a 1£1 matrix or just an ordinary number.
Example: Below x is a 3 £ 1 column vector, y is a 1 £ 3 row vector and z is a
1 £ 1 scalar:
2 3
1
£
¤
x = 4 2 5 ; y = 5 4 2 ; z = 3:
3
Remark: Any m £ n matrix A can be usefully thought of as consisting of n
column vectors of dimension m £ 1 or m row vectors of dimension 1 £ n:
Example: The 3 £ 2 matrix:
2
3
2 2
3 2
3 3
2 £
5 4
¤ 3
6
5 4
5
4
6 £
¤
A=4 3 1 5=4 4 3 5 4 1 5 5=6
6 3 1
4
6 2
6
2
£
¤
6 2
is made up of two column vectors:
2 3
2 3
5
4
4 3 5 and 4 1 5
6
2
7
7
7
7
5
or three row vectors:
£
3.1
5 4
¤ £
¤
£
¤
; 3 1 and 6 2 :
Matrix Addition and Subtraction
Your instincts from ordinary algebra are probably quite reliable for the addition
and subtraction of matrices. The rules are very simple. If A and B are both
m £ n matrices (of the same order) then:
De…nition 104 If A = [aij ] and B = [bij ] are both m £ n matrices and C =
A + B; then C is an m £ n matrix with: C = [aij + bij ] :
De…nition 105 If A = [aij ] and B = [bij ] are both m £ n matrices and C =
A ¡ B; then C is an m £ n matrix with: C = [aij ¡ bij ] :
CHAPTER 3. MATRIX ALGEBRA
104
Remark: The only way you are likely to wrong here is if you try and add or
subtract two matrices of a di¤erent order.
Example 1:
2
3 2
3 4
4 ¡2 1 5 + 4
6 2
2
3 2
3 4
4 ¡2 1 5 ¡ 4
6 2
3
2
3
5 3
8 7
8 3 5 = 4 6 4 5
9 1
15 3
3
2
3
5 3
¡2
1
8 3 5 = 4 ¡10 ¡2 5 :
9 1
¡3
1
Example 2: The sum:
·
3 4
¡2 1
¸
2
3
5 3
+4 8 3 5
9 1
is not de…ned since the two matrices are not of the same order.
It is also possible to multiply any matrix by any scalar. Again the rule is
very simple:
De…nition 106 If C = ®A where ® is a scalar and A = [aij ] is an m £ n
matrix then C is an m £ n matrix with C = [®aij ] :
Example:
2
3 2
3 2
3
3 4
6£3 6£4
18 24
6 4 ¡2 1 5 = 4 6 £ ¡2 6 £ 1 5 = 4 ¡12 6 5 :
6 2
6£6 6£2
36 12
3.1.1
The Matrix 0
We will often come across matrices which have all elements equal to zero: We
have:
De…nition 107 In matrix algebra when we write A = 0 we mean that all elements of A are equal to 0:
De…nition 108 In matrix algebra when we write A 6= 0 we mean that A is not
the 0 matrix; that is there exists at least one element of A which is not zero.
Example 1: Given the 3 £ 2 matrix A;
2
3
3 4
4 ¡2 1 5
6 2
CHAPTER 3. MATRIX ALGEBRA
105
if we subtract it from itself we obtain:
A¡A =0
or:
2
3 2
3 2
3
3 4
3 4
0 0
4 ¡2 1 5 ¡ 4 ¡2 1 5 = 4 0 0 5 :
6 2
6 2
0 0
Note that here 0 is not the ordinary number 0 but a 3 £ 2 matrix of 00 s; that
is:
2
3
0 0
0 = 4 0 0 5:
0 0
Thus if we were just to write A ¡ A = 0 under the assumption that A is 3 £ 2;
it is left implicit that the dimension of 0 matrix is also 3 £ 2:
Example 2: If
2
3
0 0
A=4 0 0 5
5 0
then we can legitimately write A 6= 0 since a32 6= 0:
3.2
Matrix Multiplication
Unlike addition and subtraction, matrix multiplication is tricky and your instincts from ordinary algebra are likely unreliable. We begin with the simplest
case where we multiply a row and a column vector. We have:
De…nition 109 Let a = [ai ] be a 1 £ n row vector and let b = [bi ] be a n £ 1
column vector. Then the product ab is a scalar given by:
2
3
b1
6 b2 7
7
£
¤6
6
7
ab ´ a1 a2 a3 ¢ ¢ ¢ an 6 b3 7 ´ a1 b1 + a2 b2 + ¢ ¢ ¢ + an bn :
(3.1)
6 .. 7
4 . 5
bn
Example: Given:
a=
£
1 3 6
¤
2
3
2
and b = 4 4 5 ;
7
CHAPTER 3. MATRIX ALGEBRA
106
then the product of these two matrices is:
2 3
£
¤ 2
ab = 1 3 6 4 4 5 = 1 £ 2 + 3 £ 4 + 6 £ 7 = 56:
7
Remark: Here the order is important; that is ab and ba are not equal!
In the example above while ab is the scalar 56, we shall see that ba is in fact a
3 £ 3 matrix given by:
2 3
2
3
2 £
2 6 12
¤
ba = 4 4 5 1 3 6 = 4 4 12 24 5 :
7
7 21 42
Now consider calculating AB where A and B are not vectors. Part of the
trick is to think of A as a collection of row vectors and B as a collection of
column vectors. The elements of AB are then found by multiplying the row
vectors of A with the column vectors of B in the manner we have just learned.
De…nition 110 If A is an m £ n matrix and B is an n £ s matrix then to
obtain AB write A as a collection of m row vectors and B as a collection of s
column vectors as:
3
2
a1
6 a2 7
£
¤
7
6
A = 6 . 7 ; B = b1 b2 : : : bs
.
4 . 5
am
where the 1 £ n row vector ai is the ith row of A and the n £ 1 column vector bj
is the j th column of B. The product C = AB is then an m £ s matrix de…ned
as:
3
2
a1 b1 a1 b2 ¢ ¢ ¢ a1 bs
6 a2 b1 a2 b2 ¢ ¢ ¢ a2 bs 7
7
6
C´6 .
..
.. 7 :
..
4 ..
.
.
. 5
am b1
am b2
¢¢¢
am bs
In order for the product AB to be de…ned the number of columns in A must
equal the number of rows in B: A recipe for determining if AB is de…ned, and
then computing AB is found below:
Recipe for Matrix Multiplication
CHAPTER 3. MATRIX ALGEBRA
107
Given two matrices: A which is m £ n and B which is r £ s; write the
dimensions of the matrices in the order you wish to multiply them. Thus for
AB we would write
m £ njr £ s:
We have:
1. The product AB is de…ned if and only if the two numbers in the middle
are the same; that is if n = r or
m £ njr £ s:
|{z}
n=r
2. If 1. is satis…ed so that AB is de…ned, then the dimension of AB is found
by eliminating the two identical numbers n and r in the middle as:
m£
njr £ s =) m £ s
|{z}
eliminate
so that AB is an m £ s matrix.
3. Write A as a collection of m row vectors and B as a collection of s column
vectors. The i; j th element of C = AB = [cij ] is then found by multiplying
the ith row vector in A with the j th column vector in B so that cij = ai bj :
Example 1: Consider calculating AB for the two matrices:
2
3
2
3
3 4
6 7
A = 4 ¡2 1 5 and B = 4 8 4 5 :
6 2
6 3
Following the recipe we have:
1. Writing out the dimensions of AB as:
3 £ 2j3 £ 2
we see that the two inside numbers do not match (i.e., 2 6= 3 ) and so the
product AB is not de…ned. There is therefore no AB to calculate!
Example 2: Consider:
2
3
·
¸
3 4
5 2 1
A = 4 2 1 5 and B =
:
3 3 4
6 2
Following the recipe we have:
CHAPTER 3. MATRIX ALGEBRA
108
1. Writing out the dimensions of AB as:
3 £ 2j2 £ 3
we see that the two middle numbers match and so the product AB is
de…ned.
2. Deleting the two middle numbers we …nd that:
3 £ 2j2 £ 3 =) 3 £ 3
so that AB is a 3 £ 3 matrix.
3. To calculate AB we write A as a collection of 3 row vectors and B as a
collection of 3 column vectors as:
¤ 3
2 £
3 4
6
7
· · ¸ · ¸ · ¸ ¸
6 £
¤ 7
5
2
1
6
7
:
A = 6 2 1 7; B =
3
3
4
5
4
£
¤
6 2
Carrying out the
2 £
3
6
6 £
6 2
AB = 6
6
6 £
4 6
2
£
multiplication we …nd that:
¤ 3
4
7
¤ 7· · ¸ · ¸ · ¸ ¸
1 7
5
2
1
7
7
3
3
4
¤ 7
2 5
¤
·
5
3
¸
3 4
6
6
6
6
· ¸
6 £
¤ 5
6
2 1
= 6
3
6
6
6
· ¸
4 £
¤ 5
6 2
3
2
3
27 18 19
= 4 13 7 6 5 :
36 18 14
£
£
3 4
2 1
£
¤
¤
6 2
·
·
¤
2
3
2
3
·
¸
¸
2
3
¸
£
3 4
¤
·
1
4
¸ 3
7
7
7
· ¸ 7
7
£
¤ 1
7
2 1
7
4
7
7
· ¸ 7
£
¤ 1 5
6 2
4
For example to …nd the 2; 3 element of AB we multiply
· ¸
£
¤ 1
2 1
=2£1+1£4=6
4
while to obtain the 1; 1 element we multiply:
· ¸
£
¤ 5
3 4
= 3 £ 5 + 4 £ 3 = 27:
3
You should repeat the calculation of the remaining elements on your own.
CHAPTER 3. MATRIX ALGEBRA
109
Example 3: Consider reversing the order of the multiplication in the previous
example and calculating BA; where A and B are given above. Following the
recipe we have:
1. Since B is 2 £ 3 and A is 3 £ 2 we have
2 £ 3j3 £ 2
so that the two inside numbers match and BA is de…ned.
2. Eliminating the two inside numbers we …nd that
2 £ 3j3 £ 2 =) 2 £ 2
so the resulting matrix will be a 2 £ 2 matrix.
3. Writing B as a collection of 2 row vectors and A as a collection of 2 column
vectors as:
¤ 3
2 £
2 2 3 2 3 3
5 2 1
3
4
4
5
4
4
5
4
2
1 5 5
B= £
¤ ; A=
3 3 4
6
2
we have:
2 £
BA = 4
2
6
6
6
6
= 6
6
6
6
4
=
·
3 2 3 3
3
4
54 4 2 5 4 1 5 5
£
¤
3 3 4
6
2
2 3
2 3 3
£
¤ 3
£
¤ 4
5 2 1 4 2 5
5 2 1 4 1 5 7
7
7
6
2
7
7
2 3
2 3 7
7
7
£
¤ 3
£
¤ 4
5
4
5
4
5
3 3 4
2
3 3 4
1
6
2
¸
25 24
:
39 23
5 2 1
¤ 32 2
Note that BA is 2£2 while AB is 3£3: This illustrates the important
fact that even when the product exists: AB 6= BA:
Note that neither AA nor BB is de…ned.
3.2.1
The Identity Matrix
The identify matrix I in matrix algebra plays the same role as 1 in ordinary
algebra.
De…nition 111 Identity Matrix: The identity matrix I is an n £ n square
matrix with ones along the diagonal and zeros on the o¤-diagonal.
CHAPTER 3. MATRIX ALGEBRA
110
Note that just as the number 1 has the property:
1£5=5£1 =5
the identity matrix has the same property for matrices; that is:
Theorem 112 For all matrices:
IA = AI = A:
Example: The 3 £ 3 identity matrix is
2
1
I=4 0
0
given by:
3
0 0
1 0 5
0 1
and you can verify that:
2
32
3 2
3
27 18 19
1 0 0
27 18 19
4 13 7 6 5 4 0 1 0 5 = 4 13 7 6 5 :
36 18 14
0 0 1
36 18 14
3.3
The Transpose of a Matrix
It is very common in matrix algebra to reverse the rows and columns of a matrix
which results in the transpose of a matrix:
De…nition 113 Transpose: If A = [aij ] is an m£n matrix then the transpose
of A; denoted by: AT is an n £ m matrix where the i; j th element is aji or:
AT ´ [aji ] :
Remark: A seemingly trivial but remarkably useful fact is that the transpose
of a scalar is itself. For example: 5T = 5:
Example:
2
3T
3 4
4 ¡2 1 5
6 2
·
¸T
5 2 1
¡3 3 4
·
3 ¡2 6
4
1 2
2
3
5 ¡3
3 5:
= 4 2
1
4
=
¸
and
Note that in the …rst case the transpose causes a 3 £ 2 matrix to become a 2 £ 3
matrix while in the second it causes a 2 £ 3 matrix to become a 3 £ 2 matrix.
We have:
CHAPTER 3. MATRIX ALGEBRA
111
Theorem 114 Transposes satisfy:
1. If AB is de…ned then (AB)T = B T AT
¡ ¢T
2. AT = A
T
3. (A + B) = AT + B T
Remark: The …rst of these results is the trickiest. The key step in …nding
(AB)T is that you must …rst reverse the order of multiplication before
applying T to A and B:
3.3.1
Symmetric Matrices
Recall that for square matrices the diagonal goes from the top left-hand corner
to the bottom right-hand corner. For example with:
2
3
1 2 5
A=4 2 3 6 5
5 6 4
the diagonal consists of the elements 1; 3 and 4: The o¤-diagonal elements are
then those elements above and below the diagonal. Notice that in this example
the elements above the diagonal mirror the elements below the diagonal. We
call such matrices symmetric matrices.
The precise de…nition of a symmetric matrix is:
De…nition 115 Symmetric Matrix: A matrix A is symmetric if and only if
A = AT :
Remark: Only square matrices can be symmetric since if A is m £ n then AT
is n £ m and so A = AT implies that m = n:
Example: The matrix A
2
1 2
A=4 2 3
5 6
However the matrices
2
1
B = 4 2
7
2
1
C = 4 2
5
above is symmetric since: A = AT
3 2
3T 2
1 2
5
1 2 5
6 5=4 2 3 6 5 =4 2 3
5 6
4
5 6 4
or
3
5
6 5:
4
B and C below are not symmetric since:
2
3T 2
3
3
1 2 5
1 2 7
2 5
3 6 5 6= B T = 4 2 3 6 5 = 4 2 3 6 5
7 6 4
5 6 4
6 4
2
3T
3
·
¸
1 2
2
1 2 5
T
4
5
5
3 6= C = 2 3
=
:
2 3 6
5 6
6
CHAPTER 3. MATRIX ALGEBRA
3.3.2
112
Proof that AT A is Symmetric
In general it is not possible to take powers of matrices. Thus if A is an m £ n
matrix then the square of A or A2 = AA is not de…ned unless n = m. Squares
only exist for square matrices. However we can always square A as: AT A which
is an n £ n matrix or as AAT which is an m £ m matrix.
Matrices such as a AT A and AAT turn out to be important in econometrics.
These matrices are always symmetric! Thus:
Theorem 116 The matrices: AT A and AAT are symmetric.
Proof. In general to prove symmetry we begin with C T and try and show
that it is equal to C: Thus if C = AT A then:
CT
¡ T ¢T
A A
(de…nition)
¡
¢
T
T
= AT AT
(since (DE) = E T DT )
¡ ¢T
= AT A (since DT = D)
= C (de…nition)
=) C is symmetric.
=
You can show that AAT is symmetric in the same way or use the above result
since if D = AAT then D = B T B where B = AT and so D has the form
AT A and hence is symmetric.
Example: Given:
2
3
3 4
A = 4 ¡2 1 5
6 2
we have AT A and AAT are symmetric as predicted by the Theorem since:
2
3
·
¸
·
¸
3 4
3 ¡2 6 4
49 22
¡2 1 5 =
AT A =
4
1 2
22 21
6 2
and
2
3
2
3
¸
3 4 ·
25 ¡2
26
3 ¡2 6
5 ¡10 5 :
AAT = 4 ¡2 1 5
= 4 ¡2
4
1 2
6 2
26 ¡10
40
3.4
The Inverse of a Matrix
Just as with ordinary numbers we will want to divide with matrices. With
ordinary numbers we can express division a ¥ b using multiplication and inverse
CHAPTER 3. MATRIX ALGEBRA
113
as: a ¥ b ´ a £ b¡1 . Now replacing a and b with two matrices A and B; we
already know how to multiply them as A £ B; so if we can …nd the analogue of
B ¡1 or the inverse of a matrix, we will be able to extend division to matrices as
A ¥ B ´ A £ B ¡1 :
Returning to ordinary numbers, the inverse of 3 is 13 which satis…es 3 £ 13 =
£ 3 = 1: Now in matrix algebra the role of 1 is played by the identity matrix
I and so we have:
1
3
De…nition 117 Matrix Inverse: The inverse of a square n £ n matrix A is
an n £ n; matrix denoted by A¡1 ; which satis…es:
A¡1 A = AA¡1 = I:
Remark 1: Generally in matrix algebra we write A £ B ¡1 rather than A ¥ B:
Remark 2: With ordinary numbers we often expression division as a ¥ b ´ ab :
The notation ab works for ordinary numbers since the order of multiplication
does not matter; that is a £ b¡1 = b¡1 £ a ´ ab : For matrices the order of
multiplication does matter since A£B ¡1 6= B ¡1 £A: Thus it is a bad notation
A
to write for matrices: B
since it does not indicate whether you mean A £ B ¡1
A
¡1
:
or B £ A: Thus for two matrices A and B do not write B
Remark 3: A matrix must be square to have an inverse. For example:
2
is not de…ned.
3¡1
3 4
4 ¡2 1 5
6 2
Remark 4: Not all square matrices have an inverse. For example the scalar 0
does not have an inverse nor does any n £ n square matrix 0 have an inverse
since 0A = A0 = 0 for all matrices but if A were the inverse of 0 we would have
A0 = I; a contradiction.
There are also square matrices with non-zero elements which do not have an
inverse. We use the following terminology:
De…nition 118 Non-Singular Matrices: If a matrix A has an inverse we
say that A is non-singular or invertible.
De…nition 119 Singular Matrices: If a matrix A does not have an inverse
we say A is singular or non-invertible.
CHAPTER 3. MATRIX ALGEBRA
114
Example 1: An example of a non-singular matrix is:
·
¸
49 22
22 21
which has an inverse
·
49 22
22 21
¸
£
¸¡1
1
=
545
·
21 ¡22
¡22
49
¸
since:
1
545
·
21 ¡22
¡22
49
·
49 22
22 21
¸
=
1
545
·
545 0
0 545
¸
=
·
1 0
0 1
¸
which you can verify on your own by carrying out the multiplication.
Example 2: A matrix with non-zero elements which does not have an inverse
(or is singular or non-invertible) is:
·
¸
1 2
A=
:
1 2
Proof. We use proof by contradiction. Assume to the contrary that a
matrix B = A¡1 exists. Since BA = I by the de…nition of an inverse we have:
·
¸·
¸ ·
¸
b11 b12
1 2
1 0
=
:
1 2
0 1
b21 b22
Carrying out the multiplication we …nd from multiplying the …rst row of B with
the …rst column of A that:
b11 + b12 = 1
while multiplying the …rst row of B with the second column of A gives:
2b11 + 2b12 = 0 =) b11 + b12 = 0:
Combining these two results we obtain the contradiction: 1 = 0. Thus A does
not have an inverse.
Here are some useful results for inverses:
Theorem 120 If A has an inverse then it is unique.
¡
¢¡1
Theorem 121 If A¡1 exists then A¡1
= A.
¡ ¢¡1 ¡ ¡1 ¢T
Theorem 122 If A¡1 exists then AT
= A
:
Theorem 123 If A and B are non-singular matrices of the same order then
(AB)
¡1
= B ¡1 A¡1 :
CHAPTER 3. MATRIX ALGEBRA
115
Theorem 124 If A is a 2 £ 2 matrix then its inverse exists if and only if
a11 a22 ¡ a12 a21 6= 0
and is given by:
·
a11
¡1
A =
a21
¸¡1
a12
a22
1
=
a11 a22 ¡ a12 a21
Example: The matrix:
A=
·
1 2
1 2
·
a22
¡a21
¡a12
a11
¸
:
¸
does not have an inverse (or is singular) since:
a11 a22 ¡ a12 a21 = 1 £ 2 ¡ 2 £ 1 = 0:
Remark 1: Note the similarity of (AB)T = B T AT and (AB)¡1 = B ¡1 A¡1
where for both the order is reversed before applying either T or ¡1 to the individual matrices.
Remark 2: Later on we will see that for n = 2 the scalar a11 a22 ¡ a12 a21 is the
determinant of A or:
det [A] = a11 a22 ¡ a12 a21 :
Remark 3: Note from Theorem 122 that we can always reverse the order of
the transpose T and inverse ¡1 : A consequence of this is that:
Theorem 125 If A is symmetric and A¡1 exists, then A¡1 is symmetric.
Proof. If A is symmetric then AT = A: Now:
¡ ¡1 ¢T ¡ T ¢¡1
A
= A
= A¡1
and so A¡1 is symmetric.
Example 1: The symmetric matrix:
·
¸
9 3
A=
3 2
has an inverse since:
a11 a22 ¡ a12 a21 = 9 £ 2 ¡ 3 £ 3 = 9 6= 0
and A¡1 is given by:
·
which is also symmetric.
9 3
3 2
¸¡1
1
=
9
·
=
·
2 ¡3
¡3
9
¸
2
1
¡3
9
¡ 13
1
¸
CHAPTER 3. MATRIX ALGEBRA
3.4.1
116
Diagonal Matrices
Generally speaking multiplying and inverting matrices is di¢cult and best left to
computers. There is at least one important special case for which multiplication
and inversion is easy. We have:
De…nition 126 Diagonal Matrix: If A = [aij ] is an n£n matrix with aij = 0
for i 6= j or
2
3
a11 0 ¢ ¢ ¢
0
6 0 a22 ¢ ¢ ¢
0 7
6
7
A=6 .
7
.
.
..
..
4 ..
0 5
0 ¢¢¢
0 ann
then A is a diagonal matrix.
Example: For the matrices below:
2
3 2
3 2
3
3 0 0
3 0 7
3 0 6
4 0 2 0 5;4 4 2 0 5;4 4 2 0 5
0 0 4
0 0 4
3 0 4
the …rst is a diagonal matrix while the second and third are not.
Diagonal matrices are easy to multiply, you just multiply the corresponding
diagonal elements. Thus:
Theorem 127 If A and B are diagonal matrices of
32
2
b11
0
a11 0 ¢ ¢ ¢
76 0
6 0 a22 ¢ ¢ ¢
0
76
6
A£B = 6 .
76 .
..
..
4 ..
.
0 5 4 ..
.
0 ¢¢¢
0 ann
0
2
a11 b11
0
¢¢¢
0
6
b
¢
¢
¢
0
0
a
22
22
6
= 6
..
..
.
.
4
.
.
0
.
0
¢¢¢
0
ann bnn
the same order then:
3
0 ¢¢¢
0
0 7
b22 ¢ ¢ ¢
7
7
..
..
.
0 5
.
¢¢¢
0 bnn
3
7
7
7:
5
Remark: Note that for diagonal matrices AB = BA!
Example: Given:
2
3
2
3
2 0 0
5 0 0
A = 4 0 3 0 5; B = 4 0 6 0 5
0 0 4
0 0 7
CHAPTER 3. MATRIX ALGEBRA
117
we have:
2
32
3 2
3 2
3
2 0 0
5 0 0
5£2
0
0
10 0 0
AB = 4 0 3 0 5 4 0 6 0 5 = 4 0
3£6
0 5 = 4 0 18 0 5 :
0 0 4
0 0 7
0
0
4£7
0 0 28
Finding the inverse of diagonal matrices is also very easy, you merely take
the inverse of each element along the diagonal. Thus
Theorem 128 A diagonal matrix A is non-singular
onal elements are non-zero in which case:
2 1
0 ¢¢¢
0
a11
1
6 0
¢
¢
¢
0
a22
6
A¡1 = 6 .
..
..
4 ..
.
0
.
1
0 ¢¢¢
0
ann
if and only if all its diag3
7
7
7:
5
Example 1:
2
3¡1 2 1
3 0 0
3
4 0 2 0 5 =4 0
0 0 4
0
0
1
2
0
3
0
0 5:
1
4
Example 2: Since the identity matrix I is diagonal with 10 s along the diagonal,
it follows that: I ¡1 = I:
Example 3: The diagonal matrix:
2
3
3 0 0
4 0 2 0 5
0 0 0
is singular (or non-invertible or it does not have an inverse) since the third
diagonal element is zero.
3.5
The Determinant of a Matrix
An important characteristic of a square matrix is its determinant. If A is an
n £ n matrix then we write its determinant as j A j or det [A] :
To begin the determinant when A is a 1 £ 1 scalar or a 2 £ 2 matrix is given
by:
De…nition 129 If A is a 1 £ 1 scalar then det [A] = A while if A = [aij ] is a
2 £ 2 matrix:
·
¸
a11 a12
det [A] = det
= a11 a22 ¡ a12 a21:
a21 a22
CHAPTER 3. MATRIX ALGEBRA
118
Example: det [5] = 5; det [¡3] = ¡3 while:
·
¸
5 1
det
= 5 £ 3 ¡ 1 £ 4 = 11:
4 3
To de…ne determinants properly for n ¸ 3 is somewhat complicated since
it involves the concept of a permutation. Rather than going into this we will
instead use the Laplace expansion to reduce the calculation of an nth order
th
th
determinant to a series of (n ¡ 1) order determinants. These (n ¡ 1) order
determinants are called minors and are found by removing one row and one
column from a matrix.
De…nition 130 Minors: The i; j th minor of a matrix A; denoted by mij ; is
given by:
mij = det [Aij ]
where Aij is the (n ¡ 1) £ (n ¡ 1) matrix obtained by removing the ith row and
the j th column of A:
We then de…ne the i; j th cofactor as either mij if i + j is even, or ¡mij if
i + j is odd. Thus:
De…nition 131 Cofactors: The i; j th cofactor of a matrix A; denoted by cij ;
is given by:
cij = (¡1)i+j mij
where mij is the i; j th minor of A:
Example: Consider the 3 £ 3 matrix:
2
3
3 1 4
A = 4 1 2 6 5:
3 1 8
The 1; 1 minor: m11 is obtained by removing the …rst row and …rst column so
that
·
¸
2 6
= 10:
m11 = det
1 8
Since 1 + 1 = 2 is even, the 1; 1 cofactor is
c11 = (¡1)1+1 £ 10 = 10:
To calculate the 3; 2 minor: m32 we remove the third row and the second column
of A to obtain:
·
¸
3 4
= 14
m32 = det
1 6
CHAPTER 3. MATRIX ALGEBRA
119
and since 3 + 2 = 5 is odd, the cofactor is the negative of m32 :
c32 = (¡1)3+2 £ 14 = ¡14:
Remark: When calculating the cofactors cij there is a pattern of alternating
10 s and ¡10 s that are applied to the minors mij that looks like this:
2
3
1 ¡1
1 ¢¢¢
6 ¡1
1 ¡1 ¢ ¢ ¢ 7
6
7
:
6 1 ¡1
1 ¢¢¢ 7
4
5
..
..
.. . .
.
.
.
.
Notice that the diagonal elements always have
matrix this pattern is:
2
1 ¡1
1 ¡1
6 ¡1
1
¡1
1
6
4 1 ¡1
1 ¡1
¡1
1 ¡1
1
1: For example with a 4 £ 4
3
7
7:
5
The Laplace expansion then states that det [A] can be found by moving
across any row or down any column of A; multiplying each element in that row
or column aij by its cofactor cij , and then summing.
Theorem 132 Laplace Expansion: Given an n£n matrix A = [aij ] with cofactors cij then det [A] is given either as the sum of the products of the elements
of the ith row with their cofactors as:
det [A] = ai1 ci1 + ai2 ci2 + ai3 ci3 + ¢ ¢ ¢ + ain cin
or as the sum of the products of the elements of the j th column with their
cofactors as:
det [A] = a1j c1j + a2j c2j + a3j c3j + ¢ ¢ ¢ + anj cnj :
Here is a recipe for calculating a determinant:
Recipe for Calculating det [A]
1. Pick any row or column of A and move down that row or column.
2. When you get to a particular element aij delete the corresponding row
and column, take the determinant of what is left over to obtain the minor
mij , and multiply the two as: aij £mij .
CHAPTER 3. MATRIX ALGEBRA
120
3. Multiply the result in step 2: by either ¡1 or 1 depending on whether i + j
is odd or even.
4. Continue to the next element in the row or column and add all the terms
you obtained in step 3: together.
Remark: In general calculating determinants is di¢cult and best left to computers, which use more e¢cient algorithms than the Laplace expansion. Unless
A has some special properties, you are unlikely to have to calculate by hand
determinants larger than 4 £ 4:
Example: Consider the 3 £ 3 matrix:
2
3
3 1 4
A = 4 1 2 6 5:
3 1 8
To calculate det [A] let us begin by going across the …rst row. Coming to
the …rst element: a11 = 3 we remove the …rst row and …rst column, take the
determinant of what is left over and multiply by a11 = 3 to obtain:
·
¸
2 6
1+1
(¡1)
£ 3 £ det
= 30:
1 8
Since the sum of the rows and columns of a11 is 1 + 1 = 2, which is even, we
1+1
= 1 and so this term does nothing to the result.
have (¡1)
We now move across the row to the next element a12 = 1: Removing the
corresponding row and column and taking the determinant we obtain:
·
¸
1 6
1+2
(¡1)
£ 1 £ det
= 10:
3 8
1+2
= ¡1 and so changes the
Since 1 + 2 = 3 is an odd number, the term (¡1)
sign of the result.
Finally we come to the last element of the row a13 = 4: Removing the
corresponding row and column we obtain:
·
¸
1 2
1+3
(¡1)
4 £ det
= ¡20:
3 1
1+3
= 11 and this term does nothing to
Since 1 + 3 = 4 is even, the term (¡1)
the result.
Thus adding all these results together we …nd that:
det [A] = a11 c11 + a22 c22 + a33 c33 = 30 + 10 ¡ 20 = 20:
Notice the pattern of pluses and minus here is: 1; ¡1; 1.
CHAPTER 3. MATRIX ALGEBRA
121
We could also have calculated det [A] above by going across the second row
as:
det [A] = a21 c21 + a22 c22 + a23 c23
·
¸
·
¸
·
¸
1 4
3 4
3 1
= ¡1 £ (1) £ det
+ (2) £ det
+ ¡1 £ (6) £ det
1 8
3 8
3 1
= 20:
(here the pattern of pluses and minus here is: ¡1; 1; ¡ 1) or by going down the
third column as:
det [A] = a13 c31 + a23 c23 + a33 c33
·
¸
·
¸
·
¸
1 2
3 1
3 1
= 4 £ det
¡ 6 £ det
+ 8 £ det
3 1
3 1
1 2
= 20:
Notice the pattern of pluses and minus here is: 1; ¡1; 1.
Although determinants are hard to numerically calculate, there are a number
of results which make theoretical manipulations of determinants quite easy. In
particular:
Theorem 133 If A and B are square n £ n matrices then
1. det [AB] = det [A] det [B] :
£ ¤
2. det AT = det [A]
¤
£
3. If A is non-singular then det A¡1 =
1
det[A] :
4. If B is obtained by switching any two rows or any two columns of A then: det [B] =
¡ det [A] :
5. If B is obtained by adding one row or column of A to another row or
column of A then det [B] = det [A] :
6. If ® is a scalar and A is an n £ n matrix then det [®A] = ®n det [A] :
One of the reasons we are interested in determinants is that they tell us
whether or not a matrix A has an inverse; in particular a necessary and su¢cient
condition for the inverse to exist is that the determinant not be 0 or:
Theorem 134 Given an n £ n matrix A the inverse A¡1 exists if and only if
det [A] 6= 0:
Theorem 135 Given an n £ n matrix A the inverse A¡1 does not exist if and
only if det [A] = 0:
CHAPTER 3. MATRIX ALGEBRA
122
Remark: The only scalar that does not have an inverse is 0: While there
are square matrices with non-zero elements that do not have an inverse, they
nevertheless have a zero-like quality, in particular their determinant must be
zero. Later we will see that non-invertible matrices also must have a 0 eigenvalue
as well.
Remark: From result 1. of the theorem it follows that if either A or B is
singular then AB is also singular, that is if either det [A] = 0 or det [B] = 0
then det [AB] = det [A] det [B] = 0:
Example: For the matrix:
2
3
3 1 4
A=4 1 2 6 5
3 1 8
we showed that det [A] = 20: It follows then that A¡1 exists. Without actually
calculating A¡1 we know that
¤
£
det A¡1 =
1
1
= :
det [A]
20
£ ¤
We also know that det AT = 20: Suppose we multiplied every element of A by
2 so that:
2
3 2
3
3 1 4
6 2 8
B = 2A = 2 4 1 2 6 5 = 4 2 4 12 5 :
3 1 8
6 2 16
Then since A is 3 £ 3 it follows that:
det [B] = 23 det [A] = 8 £ 20 = 160:
3.5.1
Determinants of Upper and Lower Triangular Matrices
Determinants are in general di¢cult to compute. Two types of matrices for
which determinants are easy to compute are upper and lower triangular matrices:
De…nition 136 Upper Triangular Matrix: An n £ n matrix A = [aij ] is
upper triangular matrix if it has all zeros below the diagonal or:
2
3
a11 a12 ¢ ¢ ¢
a1n
6 0 a22 ¢ ¢ ¢
a2n 7
6
7
A=6 .
7
..
..
4 ..
5
.
a
.
n¡1n
0
¢¢¢
0
ann
CHAPTER 3. MATRIX ALGEBRA
123
De…nition 137 Lower Triangular Matrix: An n £ n matrix A = [aij ] is
lower triangular matrix if it has all zeros above the diagonal or
2
3
a11
0
¢¢¢
0
6 a21 a22
¢¢¢
0 7
6
7
A=6 .
7:
.
.
..
..
4 ..
0 5
an1 ¢ ¢ ¢ ann¡1 ann
Remark: A diagonal matrix is both upper and lower triangular.
Determinants of triangular matrices are easy to calculate. We have:
Theorem 138 For either upper or lower triangular matrices the determinant
is the product of the diagonal elements.
From this it follows that:
Theorem 139 A lower or upper triangular matrix is non-singular if and only
if all diagonal elements are non-zero.
Example 1: Given:
2
3
2
3
2
3
3 1 4
3 0 0
3 0 0
A = 4 0 2 6 5; B = 4 1 2 0 5; C = 4 0 2 0 5
0 0 8
3 1 8
0 0 8
then A is upper triangular, B is lower triangular and C is both upper and lower
triangular, that is C is a diagonal matrix. We have:
2
3
3 1 4
det [A] = det 4 0 2 6 5 = 3 £ 2 £ 8 = 48;
0 0 8
2
3
3 0 0
det [B] = det 4 1 2 0 5 = 3 £ 2 £ 8 = 48;
3 1 8
2
3
3 0 0
det [C] = det 4 0 2 0 5 = 3 £ 2 £ 8 = 48:
0 0 8
Example 2: The matrix:
2
3
3 1 4
D=4 0 2 6 5
0 0 0
CHAPTER 3. MATRIX ALGEBRA
124
does not have an inverse since det [D] = 0:
Example 3: Since the identity matrix is a diagonal matrix it follows that
det [I] = 1: We can use this to prove that if det [A] = 0 then A does not have
an inverse. The proof is by contradiction. Suppose then that A¡1 existed and
det [A] = 0: Then:
£
¤
¤
£
1 = det [I] = det AA¡1 = det [A] det A¡1 = 0
| {z }
=0
so that 1 = 0, a contradiction. It follows then that A¡1 does not exist if
det [A] = 0:
3.5.2
Calculating the Inverse of a Matrix with Determinants
Determinants can be used to calculate the inverse of a matrix using the cofactor
and adjoint matrices de…ned as:
De…nition 140 Cofactor Matrix: Let A be an n£n square matrix and de…ne
the n £ n cofactor matrix C = [cij ] where cij is the i; j th cofactor of A.
De…nition 141 Adjoint Matrix: The adjoint matrix of A; written as adj [A] ;
is de…ned as the transpose of the cofactor matrix C or :
adj [A] = C T :
The following result holds for the adjoint matrix:
Theorem 142 For any square matrix A :
adj [A] £ A = A £ adj [A] = det [A] I:
Remark 1: If you carry out the matrix multiplication A£adj [A] for the ith row
of A and the ith column of adj [A] and equate this to the i; i element of det [A] I;
which is just det [A] ; you will see that this is just the Laplace expansion for
det [A]. The result states further that the ith row of A is orthogonal to the j th
row of adj [A] :
The adjoint matrix adj [A] is nearly the inverse A¡1 since AA¡1 = I while
A £ adj [A] = I £ jAj. We thus have:
Theorem 143 If det [A] 6= 0 then:
A¡1 =
1
adj [A] :
det [A]
CHAPTER 3. MATRIX ALGEBRA
125
Example 1: For the case of 2 £ 2 matrices:
·
¸
a11 a12
A=
a21 a22
since det [A] = a11 a22 ¡ a12 a22 ; and the cofactor and adjoint matrices are:
·
·
¸
¸
a22 ¡a21
a22 ¡a12
T
C=
and adj [A] = C =
¡a12
a11
¡a21
a11
it follows that:
A¡1 =
Example 2: Consider:
1
a11 a22 ¡ a12 a22
·
a22
¡a21
¡a12
a11
¸
:
2
3
3 1 4
A=4 1 2 6 5
3 1 8
which we showed earlier had a determinant of det [A] = 20: The cofactor matrix
is given by:
2
3
10
10 ¡5
12
0 5:
C = 4 ¡4
¡2 ¡14
5
For example the 3; 2 element is calculated as the cofactor:
·
¸
3 4
c32 = (¡1)3+2 det
= ¡14:
1 6
The adjoint matrix is then found by taking the transpose of C so that:
2
3T 2
3
10
10 ¡5
10 ¡4 ¡2
12
0 5 = 4 10 12 ¡14 5 :
adj [A] = 4 ¡4
¡2 ¡14
5
¡5
0
5
Note that A £ adj [A] = I £ det [A] is satis…ed since:
2
32
3
2
3
3 1 4
10 ¡4 ¡2
20 0 0
4 1 2 6 5 4 10 12 ¡14 5 = 4 0 20 0 5
3 1 8
¡5
0
5
0 0 20
2
3
1 0 0
= 20 4 0 1 0 5 :
0 0 1
Thus the inverse of A is:
2
3 2 1
10 ¡4 ¡2
2
1
4 10 12 ¡14 5 = 4 1
A¡1 =
2
20 ¡5
0
5
¡ 14
¡ 15
3
5
0
3
1
¡ 10
7 5
¡ 10
:
1
4
CHAPTER 3. MATRIX ALGEBRA
3.6
126
The Trace of a Matrix
Besides determinants another important characteristic of square matrices, especially in econometrics, is the sum of the diagonal elements or the trace de…ned
as:
De…nition 144 Trace: If A is a square matrix then the trace of A is denoted
by: tr [A] is
3
2
¢¢¢
a1n
a11 a12
6 a21 a22
¢¢¢
a2n 7
7
6
tr 6 .
7 = a11 + a22 + ¢ ¢ ¢ + ann :
.
.
.
.
.
4 .
.
an¡1;n 5
.
an1 ¢ ¢ ¢ an;n¡1
ann
Example:
2
3
3 1 4
tr 4 1 2 6 5 = 3 + 2 + 8 = 13:
3 1 8
Two important results to remember when manipulating traces are:
Theorem 145 tr [A + B] = tr [A] + tr [B]
Theorem 146 tr [AB] = tr [BA]
Remark: The second property is often very useful in econometrics. We
know that for matrices: AB 6= BA: Inside the trace operator however we are
free to reverse the order of multiplication.
Example 1: Note that:
2
3
2
3
¸
3 4 ·
3 18 19
5 2 1
4 ¡2 1 5
= 4 ¡13 ¡1 2 5
¡3 3 4
6 2
24 18 14
has a trace of 3 + ¡1 + 14 = 16 while:
2
3
·
¸
·
¸
3 4
5 2 1 4
17 24
5
¡2 1 =
¡3 3 4
9 ¡1
6 2
has a trace of 17 + ¡1 = 16: Thus while the two matrix products AB and BA
are di¤erent, their traces, or the sums of their diagonal elements, are the same.
Example 2: If X is an n £ p matrix then an important matrix in econometrics
is the n £ n matrix:
¢¡1 T
¡
X :
P = X XT X
CHAPTER 3. MATRIX ALGEBRA
127
¡
¢
¡
¢¡1
Note that X T X and X T X
are p £ p matrices. We have:
3
2
A
B
z
}|
{z}|{
6 ¡ T ¢¡1 T 7
X 7
tr [P ] = tr 6
5
4X X X
h
¡
¢¡1 i
= tr X T X X T X
= tr [I] = p
since I is the p £ p identity matrix.
3.7
Higher Dimensional Spaces
3.7.1
Vectors as Points in an n Dimensional Space: <n
This work is dedicated by a humble native of Flatland in the hope
that, even as he was initiated into the mysteries Of THREE Dimensions, having been previously conversant with ONLY TWO, so
the citizens of that celestial region may aspire yet higher and higher
to the secrets of FOUR FIVE OR EVEN SIX dimensions, thereby
contributing to the enlargement of THE IMAGINATION and the
possible development of that most rare and excellent gift of MODESTY among the superior races of SOLID HUMANITY. -Edwin
Abbott-Flatland
Edwin Abbott’s book is about the inhabitants of Flatland, a world than unlike our three dimensional world has only two dimensions: forwards and backwards, right and left but no up and down). In the book a native of ‡atland
communicates with someone from our three dimensional world who tries to convince him that, besides the two dimensions he experiences there is yet another
third dimension: up and down. The di¢culties the ‡atlander experiences grasping this third dimension then mirror our own di¢culties in trying to understand
the possibility of say a four dimensional space.
As economists we work with higher dimensional spaces all the time. For
example in econometrics if you have 100 observations of data, then this is represented as a point in a 100 dimensional space. Fortunately we do not have
to visually imagine such a space, instead we simply write down our data as a
100 £ 1 column vector.
To see why this makes sense think of a point in one dimension; that is along
a line or say along a particular street that runs north/south. Someone asks you
where your favourite cafe is and you tell them its 3 blocks north of here. This
number 3 then can be thought as a 1 £ 1 column vector: [3] as can any point
along the street with negative numbers used to indicate points south.
CHAPTER 3. MATRIX ALGEBRA
128
Now consider a two-dimensional space, say the location in a city on any
street. Now someone asks you where your favorite cafe is and you say: “Go 3
blocks north of here and 4 blocks east.” This can now be represented by a 2 £ 1
column vector:
· ¸
3
:
4
Now consider three dimensional space. Suppose the cafe is on the 10th ‡oor of
a building. You now say “Go 3 blocks north of here and 4 blocks east and go
up 10 ‡oors”. This can be represented by a 3 £ 1 column vector:
2
3
3
4 4 5:
10
Let us now try and imagine two four-dimensional beings where one tells his
friend how to get to his favourite cafe. Just as we would give directions with
three numbers, he would have to give directions with four numbers: one for each
dimension. Although we cannot visually imagine it, we could easily write down
the 4 £ 1 column vector he would give, for example it might be:
2
3
3
6 4 7
6
7
4 10 5
2
where 2 would represent how far you would have to go in the extra fourth
direction.
Thus while we cannot visualize spaces of four dimensions or higher, we can
easily write down vectors of any dimension and so we are actually able to investigate spaces of any dimension. We thus have:
De…nition 147 A point in an n dimensional space is represented by a n£ 1
column vector. This n dimensional space or Euclidean space is denoted by <n :
3.7.2
Length and Distance
Once we make this leap to higher dimensional spaces, it is natural ask which of
the properties of 3 dimensional space that we are familiar can be extended to n
dimensional spaces.
The …rst important characteristic is length or distance. No doubt a 4 dimensional person would also want to know how far away his favourite cafe! We
have:
De…nition 148 The length of an n £ 1 vector x is:
q
p
kxk = xT x = x21 + x22 + ¢ ¢ ¢ + x2n :
CHAPTER 3. MATRIX ALGEBRA
129
De…nition 149 The distance between two vectors x and y is kx ¡ yk.
Example: If
2
3
2 3
1
5
x = 4 2 5; y = 4 6 5
3
7
then the length of x and y and the distance between x and y are given by:
p
p
12 + 22 + 32 = 14 = 3:74
kxk =
p
p
52 + 62 + 72 = 110 = 10:49
kyk =
q
p
kx ¡ yk =
(1 ¡ 5)2 + (2 ¡ 6)2 + (3 ¡ 7)2 = 48 = 6:93:
Two important results for advanced work are:
Theorem 150 kxk = 0 if and only if x = 0; that is x is an n £ 1 vector of
zeros.
Theorem 151 Triangle Inequality: kx + yk · kxk + kyk :
Remark: The triangle inequality basically states that if you walk along a
straight line to the point x+y; then you walk a shorter distance than if you walk
…rst to x or y and then to x + y; in other words the shortest distance between
two points in an n dimensional space is still a straight line!
Example: Given x and y above the triangle inequality is satis…ed since:
q
p
(1 + 5)2 + (2 + 6)2 + (3 + 7)2 = 200 = 14:14
kx + yk =
p
p
< kxk + kyk = 14 + 110 = 14:23:
3.7.3
Angle and Orthogonality
The second important basic concept for higher dimensional spaces is angle. The
angle between two vectors x and y can be sensibly de…ned as follows:
De…nition 152 Angle: Given two n £ 1 vectors x and y the angle between x
and y is µ de…ned by:
cos (µ) =
xT y
:
kxk kyk
CHAPTER 3. MATRIX ALGEBRA
130
Example: If
2
then you can verify that:
3
2 3
1
6
x = 4 2 5; y = 4 1 5
3
7
xT y = 29; kxk =
p
p
14; kyk = 86
so that:
cos (µ) =
xT y
29
= p p = 0: 835:
kxk kyk
14 86
Using the inverse function of cos (µ) : cos¡1 from your calculator we can liberate
µ as:
µ = cos¡1 (0: 835) = 33:3
so that the angle between x and y is 33:3 degrees.
Corresponding to the requirement in trigonometry that jcos (µ)j · 1 we have:
Theorem 153 Cauchy-Schwarz Inequality:
p
p
j xT y j· kxk kyk = xT x yT y:
Proof. Let ® be a scalar and de…ne f (®) by: f (®) = k®x ¡ yk2 ¸ 0: Now:
f (®) = (®x ¡ y)T (®x ¡ y)
= ®2 kxk2 ¡ 2®xT y + kyk2 :
Now the global minimum of f (®) occurs at ®¤ where f 0 (®¤ ) = 0 since f 00 (®) =
2 kxk2 > 0 so that:
2®¤ kxk2 ¡ 2xT y = 0 =) ®¤ =
xT y
kxk2
:
Thus:
f (®¤ )
¸
=)
=)
=)
=)
0 =) ®¤2 kxk2 ¡ 2®¤ xT y + kyk2 ¸ 0
Ã
Ã
!2
!
xT y
xT y
2
kxk ¡ 2
xT y + kyk2 ¸ 0
kxk2
kxk2
¡ T ¢2
x y
2
kyk ¸
kxk2
¡ T ¢2
x y ¸ kxk2 kyk2
p
p
j xT y j· xT x y T y:
CHAPTER 3. MATRIX ALGEBRA
131
Remark: The equality j xT y j= kxk kyk occurs only if y = ±x where ± is some
scalar. In this case the angle between x and y is 0 so that cos (0) = 1:
Example: As an illustration of the Cauchy-Schwarz inequality note that in the
example above:
p
p
¯ T ¯
¯x y ¯ = 29 < kxk kyk = 14 £ 86 = 34:7:
The most important angle that we will be concerned with is where two
vectors are at right-angles, or µ = 90o in which case cos (90o ) = 0 and hence
xT y = 0:
De…nition 154 Orthogonality: If xT y = 0 we say that x and y are orthogonal to each other, or are at right-angles to each other. Sometimes this is
denoted as: x?y:
Orthogonality and non-orthogonality in <2 are illustrated below:
CHAPTER 3. MATRIX ALGEBRA
132
Remark: Since xT y is a scalar if follows that
so that:
¡ ¢T
¡
¢T
xT y = xT y = y T xT
= yT x
xT y = 0 , y T x = 0
and so you can check for orthogonality either by calculating xT y or y T x:
Example 1: In previous example x and y are not orthogonal since xT y = 29 6=
0:
CHAPTER 3. MATRIX ALGEBRA
133
Example 2: Two orthogonal vectors are:
2 3
2
3
1
6
x = 4 2 5 and y = 4 ¡3 5
3
0
since
xT y = 1 £ 6 + 2 £ ¡3 + 3 £ 0 = 0:
Suppose x and y are orthogonal so the angle between x and y is 90o : Then
x and y form a right-angled triangle with x on one side, y on the other and
the sum: x + y on the hypotenuse. You may recall from geometry that for
right-angled triangles the Pythagorean relationship
a2 + b2 = c2
holds as. The same it turns out holds for x and y in an n dimensional space.
In particular:
Theorem 155 Pythagorean Relationship: If x and y are orthogonal n £ 1
vectors then:
2
2
2
kx + yk = kxk + kyk :
Proof. If x and y are orthogonal then xT y = y T x = 0 and so:
=0
2
kx + yk
=0
z}|{ z}|{
= (x + y) (x + y) = x x + y y + xT y + y T x
= xT x + yT y
= kxk2 + kyk2 :
T
This is illustrated in the diagram below:
T
T
CHAPTER 3. MATRIX ALGEBRA
3.7.4
134
Linearly Independent Vectors
Consider a two dimensional space with the two vectors:
· ¸
· ¸
1
0
a1 =
; a2 =
:
0
1
These two vectors form a basis for <2 ; that is you can describe any point in <2
as a linear combination of a1 and a2 : For example given the vector x below:
· ¸
· ¸
· ¸
3
1
0
x=
=3
+4
= 3a1 + 4a2 :
4
0
1
Most but not all combinations of two vectors will form a basis for <2 : For
example two vectors:
· ¸
· ¸
1
3
b1 =
; b2 =
:
0
0
so not form a basis for <2 : The key requirement here is that the two vectors
be linearly independent of each other or that they point in di¤erent directions.
CHAPTER 3. MATRIX ALGEBRA
135
Thus the vector a1 points 1 block north while a2 points 1 block east and so
are linearly independent. The two vectors b1 and b2 both point in the same
direction: north and so are linearly dependent or b2 = 3b1 :
These ideas are extended to higher dimensions as follows:
De…nition 156 Linear Independence: Given n vectors a1 ; a2 ; : : : an , we
say that they are linearly independent if for any scalars: x1 ; x2 :::xn :
a1 x1 + a2 x2 + ¢ ¢ ¢ + an xn = 0 =) x1 = 0; x2 = 0; ¢ ¢ ¢ ; xn = 0:
De…nition 157 Given n vectors a1 ; a2 ; : : : an , we say that they are linearly
dependent if there exist x1 ; x2 ; :::xn ; one of which is not 0; such that:
a1 x1 + a2 x2 + ¢ ¢ ¢ + an xn = 0:
This idea can be written more compactly if we think of column vectors a1 ;
a2 ; : : : an as the columns of a matrix A: We have:
De…nition 158 The n columns of the m £ n matrix A: a1 ; a2 ; : : : an are
linearly independent if
Ax = 0 =) x = 0
where x is an n £ 1 column vector.
De…nition 159 If for x 6= 0:
Ax = 0
the columns of A are linearly dependent.
Since vectors which are orthogonal to each other must point in di¤erent
directions they must be linearly independent and so:
Theorem 160 If a1 ; a2 ; : : : an are mutually orthogonal so that aTi aj = 0 for
i 6= j then they are linearly independent.
Example 1: The vectors:
a1 =
·
1
0
¸
; a2 =
·
0
1
¸
are linearly independent since:
· ¸
· ¸
· ¸
0
0
1
x2 =
x1 +
1
0
0
CHAPTER 3. MATRIX ALGEBRA
136
can only be satis…ed when x1 = 0 and x2 = 0: Alternatively putting the two
vectors in a 2 £ 2 matrix we have:
·
¸·
¸ · ¸
·
¸ · ¸
1 0
x1
0
x1
0
=
=)
=
0 1
0
0
x2
x2
and so a1 and a2 are linearly independent. Alternatively since:
· ¸
£
¤ 0
T
a1 a2 = 1 0
=0
1
it follows that a1 and a2 are orthogonal and hence are linearly independent.
Example 2: The vectors:
b1 =
·
1
0
¸
; b2 =
·
3
0
¸
are linearly dependent since:
· ¸
· ¸
· ¸
1
3
0
x1 +
x2 =
0
0
0
can be satis…ed for non-zero x1 and x2 ; for example if x1 = 1 and x2 = ¡ 13 :
Alternatively putting b1 and b2 into a matrix we have for x 6= 0 where:
·
¸·
¸ · ¸
1 3
1
0
=
0 0
¡ 13
0
so that b1 and b2 are linearly dependent.
Example 3: Suppose that
2
You may verify that:
3
2 3
3
4
a1 = 4 ¡2 5 and a2 = 4 1 5 :
6
2
2
or:
3
2 3
2 3
3
4
0
a1 x1 + a2 x2 = 4 ¡2 5 x1 + 4 1 5 x2 = 4 0 5
6
2
0
2
3
4 ¡2
6
2 3
3
¸
·
0
4
x1
1 5
=4 0 5
x2
0
2
can only be satis…ed when x1 = x2 = 0. Consequently these two vectors are
linearly independent.
CHAPTER 3. MATRIX ALGEBRA
137
Example 4: On the other hand:
2
3
2 3
2
3
¡6
0
3
4 ¡2 5 x1 + 4
4 5 x2 = 4 0 5
¡12
0
6
is satis…ed when x1 = 2 and x2 = 1: Consequently these two vectors are linearly
dependent.
The notion of linear independence leads to the rank of a matrix.
De…nition 161 Rank of a Matrix: The rank of a matrix, denoted by rank [A] ;
is the maximum number of linearly independent column vectors of A.
De…nition 162 Full Rank: If an n £ n matrix A has rank [A] = n we say
that A has full rank.
We have:
Theorem 163 Properties of the Rank of a Matrix:
£ ¤
1. rank [A] = rank AT so the number of linearly independent column vectors equals the number of linearly independent row vectors in A:
2. If A is an m £ n matrix then rank [A] · m and rank [A] · n:
3. If A is a square n £ n matrix then A is non-singular or A¡1 exists if and
only if rank [A] = n:
4. If A is a square n £ n matrix then A is non-singular or A¡1 exists if and
only if Ax = 0 implies that x = 0 where x is an n £ 1 vector.
5. If A is a square n £ n matrix then A is singular or A¡1 does not exist if
and only if there exists an n £ 1 vector x 6= 0 such that Ax = 0 .
Example 1: Consider the square matrix:
·
¸
3 2
A=
:
1 4
You can verify on your own that A here has a rank of 2; that is the vectors
· ¸ · ¸
3
2
;
1
4
are linearly independent and consequently rank [A] = 2 and A¡1 exists, as given
by:
·
¸¡1 ·
¸
2
3 2
¡ 15
5
=
:
3
1
1 4
¡ 10
10
CHAPTER 3. MATRIX ALGEBRA
138
Example 2: The matrix B given by:
·
¸
3 6
B=
1 2
has a rank of 1 since:
·
3
1
¸
£ ¡2 +
·
6
2
¸
£1 =
·
0
0
¸
or there exists a non-zero x such that Bx = 0 where:
·
¸
¡2
x=
1
since:
·
3 6
1 2
¸·
¡2
1
¸
=
·
0
0
¸
:
Consequently B ¡1 here does not exist or B is singular.
It is possible to calculate the rank of a matrix using determinants.
Theorem 164 Given any m £ n matrix A suppose
h i that r is the order of the
~
largest r £ r sub-matrix: A of A such that det A~ 6= 0: Then rank [A] = r:
Example: For the matrix
2
3
1 2 4 5
A=4 2 5 2 1 5
1 2 3 4
examples of 2 £ 2 and 3 £ 3 sub-matrices of A
2
·
¸
1 2
1 2
;4 2 5
2 5
1 2
would be:
3
4
2 5:
3
Since A is 3 £ 4 we cannot obtain any larger sub-matrix from A than 3 £ 3:
From the theorem we have rank [A] = 3 since for the 3 £ 3 sub-matrix:
2
3
1 2 4
det 4 2 5 2 5 = ¡1 6= 0:
1 2 3
CHAPTER 3. MATRIX ALGEBRA
3.8
139
Solving Systems of Equations
Matrix algebra is important for solving systems of linear equations. For example
in the demand and supply model:
3
Q=6¡ P
2
1
Q=2+ P
2
demand :
supply :
plotted below:
4.5
4
3.5
Q3
2.5
2
1.5
1
1.5
2
P
2.5
3
Supply and Demand
we wish to …nd the equilibrium price and quantity: Q and P where the demand
and supply curves intersect. Now if we set x1 = Q and x2 = P we can rewrite
the demand and supply curves as:
3
Q = 6 ¡ P =) x1 = 6 ¡
2
1
Q = 2 + P =) x1 = 2 +
2
3
x2 =) 2x1 + 3x2 = 12
2
1
x2 =) 2x1 ¡ 1x2 = 4
2
or as the system of equations:
2x1 + 3x2
2x1 ¡ 1x2
= 12
= 4:
This can in turn be written in matrix notation as:
¸ ·
¸
·
¸·
12
2
3
x1
=
4
x2
2 ¡1
which is in the form Ax = b where:
·
¸
·
¸
·
¸
2
3
x1
12
A=
; x=
; b=
:
2 ¡1
4
x2
CHAPTER 3. MATRIX ALGEBRA
140
In the above example we have 2 equations and two unknowns. In general
in order to have a unique solution one needs as many equations as unknowns.
Thus consider a system of n equations in n unknowns as:
a11 x1 + a12 x2 + ¢ ¢ ¢ + a1n xn
a21 x1 + a22 x2 + ¢ ¢ ¢ + a2n xn
an1 x1 + an2 x2 + ¢ ¢ ¢ + ann xn
= b1
= b2
..
.
= bn
which can be written in matrix notation in the form: Ax = b as:
2
32
3 2
3
b1
a11 a12 ¢ ¢ ¢ a1n
x1
6 a21 a22 ¢ ¢ ¢ a2n 7 6 x2 7 6 b2 7
6
76
7 6
7
6 ..
..
.. 7 6 .. 7 = 6 .. 7
..
4
4 .
5
4
5
.
.
. 5
.
.
xn
bn
an1 an2 ¢ ¢ ¢ ann
A special property of linear systems of equations is that the number of
possible solutions is limited. In particular:
Theorem 165 Systems of linear equations have either no solution, one solution, or an in…nite number of solutions.
Generally we are interested in the case where one solution exists so that our
model predicts that one thing and only one thing happen. We have:
Theorem 166 The system of equations: Ax = b has a unique solution if and
only if A¡1 exists in which case the unique solution is:
x = A¡1 b:
Example 1: You can verify that the system of equations in the supply and
demand example above:
¸ ·
¸
·
¸·
12
2
3
x1
=
4
x2
2 ¡1
has a unique solution:
·
x1
x2
¸
=
=
·
·
2
3
2 ¡1
1
8
1
4
3
8
¡ 14
¸¡1 ·
¸
12
4
¸·
¸ · ¸
12
3
=
4
2
and so the unique solution is: x1 = 3 and x2 = 2 so that Q = 3 and P = 2:
CHAPTER 3. MATRIX ALGEBRA
141
Example 2: For the system of equations:
3x1 + 2x2
6x1 + 4x2
= 7
= 14
the second equation is just the …rst multiplied by 2 and so there is really only
one equation. This means that there are an in…nite number of solutions of the
form:
x2 =
7 ¡ 3x1
2
where x1 can be any number. Another way of seeing that there is a problem
here is that the matrix A is a singular matrix; that is:
·
¸
3 2
det
=0
6 4
and so A¡1 does not exist.
Example 3: The system of equations:
3x1 + 2x2
6x1 + 4x2
= 7
= 13
has no solutions since if we divide both sides of the second equation by 2 we
obtain:
3x1 + 2x2
3x1 + 2x2
= 7
= 6:5
which implies that 7 = 6:5: Again another way of seeing that there is a problem
here is that the matrix A is a singular matrix since:
·
¸
3 2
det
=0
6 4
and so A¡1 does not exist.
Since we have seen that there are a number of di¤erent necessary and su¢cient conditions for A¡1 to exist, we can state the above result more generally
as:
Theorem 167 If A is an n £ n matrix the following statements are equivalent
in the sense that if one statement holds all the rest hold as well:
1. A¡1 exists.
2. rank [A] = n
CHAPTER 3. MATRIX ALGEBRA
142
3. det [A] 6= 0
4. Ax = b has a unique solution (i.e., x = A¡1 b )
5. Ax = 0 =) x = 0.
Since these results are necessary and su¢cient, we can restate this result
using the negation of the above …ve statements as:
Theorem 168 If A is an n £ n matrix the following statements are equivalent
in the sense that if one statement all the rest hold as well:
1. A¡1 does not exist.
2. rank [A] < n
3. det [A] = 0
4. Ax = b either has no solution or an in…nite number of solutions
5. There exists an n £ 1 vector x 6= 0 such that Ax = 0:
3.8.1
Cramer’s Rule
Cramer’s rule, a method for solving systems of equations using determinants,
is used a lot in economics. Given a system of equations Ax = b suppose we
want to calculate the ith component: xi : The key operation is replacing the ith
column of A with b.
De…nition 169 Given an n £ n matrix A and a n £ 1 column vector b de…ne
Ai (b) as the n £ n matrix obtained by replacing the ith column of A with b:
Example: Given:
A=
·
1 2
3 4
¸
; b=
·
5
6
¸
then we obtain A1 (b) by putting b in the …rst column to obtain:
·
¸
5 2
A1 (b) =
;
6 4
and we obtain A2 (b) by putting b in the second column to obtain:
·
¸
1 5
A1 (b) =
:
3 6
Cramer’s rule then is:
CHAPTER 3. MATRIX ALGEBRA
143
Theorem 170 Cramer’s Rule: Given the system of equations: Ax = b with
det [A] 6= 0 then:
xi =
det [Ai (b)]
:
det [A]
Example 1: The system of equations:
3x1 + 2x2
5x1 ¡ 2x2
= 7
= 10
can be rewritten in matrix form as:
·
¸·
¸ ·
¸
3 2
x1
7
=
.
5 ¡2
10
x2
Using Cramer’s rule we have:
·
¸
·
7
2
det
det
10 ¡2
17
·
¸ = ; x2 =
·
x1 =
8
3
2
det
det
5 ¡2
¸
3 7
5 10
5
¸= :
16
3
2
5 ¡2
Example 2: Given the system of equations:
2
32
3 2 3
3 1 4
x1
5
4 1 2 6 5 4 x2 5 = 4 6 5
3 1 8
x3
7
to …nd x2 using Cramer’s rule we replace
that:
2
3 5
det 4 1 6
3 7
2
x2 =
3 1
det 4 1 2
3 1
3.9
3.9.1
the second column of A with b so
3
4
6 5
8
6
3= :
5
4
6 5
8
Eigenvalues and Eigenvectors
Eigenvalues
Suppose we have a square n £ n matrix A and we multiply it by an n £ 1 row
vector x to obtain:
y = Ax:
CHAPTER 3. MATRIX ALGEBRA
144
Note that y is itself an n £ 1 row vector. There are very special vectors x,
called eigenvectors, which have the property that y = ¸x in which case ¸ is an
eigenvalue. These turn out to be of fundamental importance in understanding
matrices.
De…nition 171 Eigenvalues and Eigenvectors: Let A be a square n £ n
matrix and suppose that:
Ax = ¸x:
where x is an n £ 1 column vector and ¸ is a scalar. Then we say that x is an
eigenvector of A and ¸ is an eigenvalue.
An n £ n matrix A will in general have n eigenvalues which are the roots of
the characteristic polynomial associated with A:
De…nition 172 Characteristic Polynomial Given an n £ n matrix A; the
characteristic polynomial of A is:
f (¸) = det [A ¡ ¸I] = ®0 ¸n + ®1 ¸n¡1 + ®2 ¸n¡2 + ¢ ¢ ¢ + ®n
where the coe¢cients ®j depend on the elements of the matrix A:
Theorem 173 An n £ n matrix A has n eigenvalues: ¸1 ; ¸2 ; : : : ¸n which are
the roots of the characteristic polynomial of A :
f (¸i ) = 0 for i = 1; 2; : : : n:
Proof. Since x = Ix we can rewrite Ax = ¸x as:
Ax = ¸Ix
or as:
(A ¡ ¸I) x = 0:
Since this equation is of the form Bx = 0 where B = (A ¡ ¸I) ; and since we
require that x 6= 0; it follows from Theorem 163 that this can only hold if B is
singular so that:
det [B] = f (¸) = det [A ¡ ¸I] = 0
and so ¸ is a root of f (¸) : Since the characteristic polynomial: f (¸) is an nth
degree polynomial, by Theorem 19, the fundamental theorem of algebra, f (¸)
has n roots, which are the n eigenvalues of A:
Example: The 2 £ 2 matrix A :
A=
·
5 ¡2
¡2
8
¸
CHAPTER 3. MATRIX ALGEBRA
145
has a characteristic polynomial which is a quadratic given by:
··
¸
·
¸¸
5 ¡2
1 0
f (¸) = det
¡¸
¡2
8
0 1
··
¸¸
5 ¡ ¸ ¡2
= det
¡2
8¡¸
= ¸2 ¡ 13¸ + 36:
Note that the coe¢cient ¡13 on ¸ is ¡T r [A] and constant term 36 is det [A] ;
which is always the case for 2 £ 2 matrices.
To …nd the eigenvalues of A we need to …nd the roots of this quadratic or
the solutions to:
¸2 ¡ 13¸ + 36 = 0
which are:
¸1;2 =
¡ (¡13) §
q
(¡13)2 ¡ 4 £ 36
2
or ¸1 = 4 and ¸2 = 9.
In the example above note that tr [A] = 13 = ¸1 + ¸2 = 4 + 9 and det [A] =
36 = ¸1 ¸2 = 4 £ 9 so that the trace is equal to the sum of the eigenvalues
and the determinant is equal to the product of the eigenvalues. This turns out
always to be the case so that:
Theorem 174 Given any n £ n matrix A with eigenvalues: ¸1 ; ¸2 ; : : : ¸n :
det [A] = ¸1 £ ¸2 £ ¢ ¢ ¢ £ ¸n
tr [A] = ¸1 + ¸2 + ¢ ¢ ¢ + ¸n :
Since det [A] is the product of the eigenvalues we have:
Theorem 175 A¡1 exists if and only if all eigenvalues are not equal to 0:
An important fact about eigenvalues is that:
Theorem 176 A and AT have the same eigenvalues.
£ ¤
Proof. Since det [B] = det B T we have:
f (¸) = det [A ¡ ¸I]
i
h
= det (A ¡ ¸I)T
£
¤
= det AT ¡ ¸I
and so AT and A have the same characteristic polynomial and hence the same
eigenvalues.
As you might expect calculating eigenvalues for upper and lower triangular
matrices (as well as diagonal matrices) is very easy. We have:
CHAPTER 3. MATRIX ALGEBRA
146
Theorem 177 If A = [aij ] is an upper or lower triangular matrix or a diagonal
matrix, then the eigenvalues of A are the diagonal elements of A:
Proof. Given the assumptions about A the characteristic polynomial of A
is:
f (¸) = det [A ¡ ¸I] = (a11 ¡ ¸) (a22 ¡ ¸) £ ¢ ¢ ¢ £ (ann ¡ ¸)
since A ¡ ¸I is upper or lower triangular and the determinant of such a matrix
is the product of the diagonal elements. Therefore if ¸ = aii then f (¸) = 0 and
¸ is an eigenvalue.
Example: The 3 £ 3 matrix A below
2
3
4 77 99
A = 4 0 5 55 5
0 0 6
is upper triangular so that its characteristic polynomial is:
f (¸) = (4 ¡ ¸) (5 ¡ ¸) (6 ¡ ¸)
and so the eigenvalues are the diagonal elements: ¸1 = 4; ¸2 = 5; ¸3 = 6:
3.9.2
Eigenvectors
For an n £ n matrix A associated with each of the n eigenvalues ¸1 ; ¸n ; : : : ¸n
will be n eigenvectors x1 ; x2 ; : : : xn which satisfy:
Axi = ¸i xi :
We have:
Theorem 178 Eigenvectors associated with distinct eigenvalues are linearly independent.
Generally an n £ n matrix will have n distinct eigenvalues so that there will
be n linearly independent eigenvectors. This in turn means that:
Theorem 179 If all eigenvalues of an n £ n matrix A are distinct then the
matrix of eigenvectors C given by:
C = [x1 ; x2 ; : : : xn ]
has rank [C] = n so that C ¡1 exists.
Remark: Complications can arise when there are repeated eigenvalues. For
example if the characteristic polynomial of a 3 £ 3 matrix A were:
f (¸) = (¸ ¡ 2)2 (¸ ¡ 6)
CHAPTER 3. MATRIX ALGEBRA
147
then the eigenvalues would be ¸1 = 2; ¸2 = 2 ¸3 = 6 and so there would be
two repeated eigenvalues equal to 2: In this case there might only be 2 linearly
independent eigenvectors rather than 3:
Another complication with eigenvectors is that unlike eigenvalues they are
not uniquely de…ned. In particular if xi is an eigenvector associated with the
eigenvalue ¸i , then any scalar multiple of xi will also be an eigenvector; that is
if ® is any scalar then:
Ax = ¸x =) A (®x) = ¸ (®x) :
For example if x is an eigenvector then A (3x) = ¸ (3x) and so 3x is also an
eigenvector.
To pin down an eigenvector one needs to adopt some convention. This
convention changes with the application according to what is convenient. Often
for example we adopt the convention that the eigenvectors have a unit length so
1
x which then satis…es
~ = kxk
that if x is an arbitrary eigenvector we work with x
k~
xk = 1:
Example: Consider the matrix:
A=
·
5 ¡2
¡2
8
¸
which we have seen has eigenvalues: ¸1 = 4 and ¸2 = 9:
The associated eigenvectors are:
· ¸
·
¸
2
1
x1 =
$ ¸1 = 4; x2 =
$ ¸2 = 9:
1
¡2
For example:
·
5 ¡2
¡2
8
¸·
2
1
¸
=4
·
2
1
¸
:
You can verify that x1 and x2 are linearly independent since:
·
¸
2
1
=) det [C] = ¡5 6= 0:
C = [x1 ; x2 ] =
1
¡2
Here x1 and x2 are not unique. Instead of x1 we could equally well use the
eigenvector 3x1 given by:
· ¸
·
¸· ¸
· ¸
6
5 ¡2
6
6
=)
=4
:
3x1 =
3
¡2
8
3
3
We can normalize x1 and x2 so that kx1 k = 1 and kx2 k = 1 using:
q
p
p
p
kx1 k = 22 + 12 = 5; kx2 k = 12 + (¡2)2 = 5
and so the normalized eigenvectors would be:
· ¸ " p2 #
·
¸ " p1 #
1
1
2
1
5
x
~1 = p
= p15 ; x
=
~2 = p
:
¡ p25
5 1
5 ¡2
5
CHAPTER 3. MATRIX ALGEBRA
3.9.3
148
The Relationship A = C¤C ¡1
We have seen that diagonal matrices are much easier to work with. It turns
out that almost all matrices can be transformed into diagonal matrices with the
eigenvalues along the diagonal. More precisely:
Theorem 180 If an n £ n matrix A has n linearly independent eigenvectors
then it can be written as:
A = C¤C ¡1
where ¤ is a diagonal matrix with the eigenvalues of A along the diagonal as:
2
3
¸1 0 ¢ ¢ ¢ 0
6 0 ¸2 ¢ ¢ ¢ 0 7
6
7
¤=6 . .
. . . . . ... 7
4 ..
5
0
¢¢¢
0
¸n
and ith column of the n £ n matrix C is the ith eigenvector xi as:
C = [x1 ; x2 ; : : : ; xn ] :
Proof. We then have:
AC
= [Ax1 ; Ax2 ; : : : ; Axn ]
= [¸1 x1 ; ¸2 x2 ; : : : ; ¸n xn ]
= C¤:
Since the eigenvectors are linearly independent, rank [C] = n and so C ¡1 exists.
Post-multiplying both sides by C ¡1 then yields A = C¤C ¡1 :
Remark: There are some matrices which cannot be written as: A = C¤C ¡1
but these are in some sense very rare. An example of such a matrix is:
·
¸
1 1
:
0 1
These exceptional matrices have two characteristics: 1) they have repeated
eigenvalues and 2) they are not symmetric. Thus in the example above since
the matrix is upper triangular we have the repeated eigenvalues: ¸1 = 1 and
¸2 = 1: For such matrices one can use the Jordan representation which we
do not discuss here.
Given A = C¤C ¡1 suppose we multiply A by itself as A2 = A £ A: Using
the representation A = C¤C ¡1 we have:
¡1
¡1
¡1
2 ¡1
A2 = C¤C
| {z C}¤C = C¤¤C = C¤ C
=I
CHAPTER 3. MATRIX ALGEBRA
149
where since ¤ is diagonal:
2
6
6
¤2 = 6
4
¸21
0
..
.
0
¸22
..
.
¢¢¢
¢¢¢
..
.
0
0
..
.
0
¢¢¢
0
¸2n
3
7
7
7:
5
That is we just square the eigenvalues along the diagonal of ¤: This means that
the eigenvalues of A2 are just the square of the eigenvalues of A: In general we
have:
Theorem 181 Given an n £ n matrix A written as: A = C¤C ¡1 then:
An = C¤n C ¡1 :
Proof. (by induction). The theorem is true for n = 1: Assuming it is true
for n ¡ 1 we have:
An
= An¡1 £ A
¡1
1 ¡1
= C¤n¡1 C
| {z C}¤ C
=I
= C¤n¡1 ¤1 C ¡1
= C¤n C ¡1 :
Theorem 182 If A¡1 exists then:
A¡1 = C¤¡1 C ¡1 :
Proof. Given that A¡1 exists then all eigenvalues are non-zero and so ¤¡1
exists. Therefore:
¡1
¡1
C¤¡1 C ¡1 A = C¤¡1 C
| {z C}¤C
¡1
=I
¡1
= C¤ ¤C
= CC ¡1 = I:
Example 1: The matrix:
A=
·
7
3
¡ 23
¡ 13
8
3
¸
has eigenvalues and eigenvectors given by:
· ¸
·
¸
1
1
; ¸2 = 3 $ x2 =
¸1 = 2 $ x1 =
1
¡2
CHAPTER 3. MATRIX ALGEBRA
so that:
¤=
·
2 0
0 3
¸
150
; C=
·
1
1
1 ¡2
The representation A = C¤C ¡1 then takes the
· 7
¸
·
¸·
1
1
2
¡ 13
3
=
8
1 ¡2
0
¡ 23
3
·
¸·
1
1
2
=
1 ¡2
0
¸
:
form:
¸·
¸¡1
0
1
1
3
1 ¡2
¸· 2
¸
1
0
3
3
1
3
¡ 13
3
which you can verify by carrying out the multiplication.
To calculate A2 directly we have:
· 7
¸ · 7
¸ · 17
¡ 13
¡ 13
2
3
3
3
A =
£
=
8
8
¡ 23
¡ 23
¡ 10
3
3
3
while with A2 = C¤2 C ¡1 we have:
¸·
·
¸· 2
1
1
2 0
0 32
1 ¡2
2
3
1
3
1
3
1
¡3
¸
=
·
17
3
¡ 10
3
To calculate A¡1 from A¡1 = C¤¡1 C ¡1 we have:
· 7
·
¸ · ¡1
¸¡1
¸·
1
1
2
¡ 13
0
3
=
8
1 ¡2
0
3¡1
¡ 23
3
· 4
¸
1
9
9
=
:
1
7
18
¡ 53
¸
¡ 53
¸
22
3
22
3
2
3
1
3
:
1
3
1
¡3
¸
18
Example 2: We can use the representation A = C¤C ¡1 to prove Theorem 174.
We have:
¤
£
det [A] = det C¤C ¡1
¤
£
= det [C] det [¤] det C ¡1
1
= det [C]
det [¤]
det [C]
= det [¤] = ¸1 £ ¸2 £ ¢ ¢ ¢ £ ¸n
since the determinant of a diagonal matrix is a product of the diagonal elements.
Similarly since tr [AB] = tr [BA] we have:
¤
£
tr [A] = tr C¤C ¡1
¤
£
= tr ¤C ¡1 C
= tr [¤]
= ¸1 + ¸2 + ¢ ¢ ¢ + ¸n :
Example 3: We can use the representation A = C¤C ¡1 to prove the matrix
version of the geometric series:
CHAPTER 3. MATRIX ALGEBRA
151
Theorem 183 Given an n £ n matrix A with eigenvalues ¸1 ; ¸n ; : : : ¸n which
all satisfy: j¸i j < 1 then:
(I ¡ A)¡1 = I + A + A2 + A3 + ¢ ¢ ¢ :
For example with
A=
·
0:3 0:65
0:2 0:72
¸
:
the two eigenvalues of A are ¸1 = 0:92725 and ¸2 = 0:0927: Since these both
satisfy j¸i j < 1 we have:
µ·
¸ ·
¸¶¡1
·
¸
1 0
0:3 0:65
4:24 9:85
¡
=
0 1
0:2 0:72
3:03 10:66
·
¸ ·
¸ ·
¸2
1 0
0:3 0:65
0:3 0:65
+¢¢¢ :
=
+
+
0 1
0:2 0:72
0:2 0:72
3.9.4
Left and Right-Hand Eigenvectors
Although A and AT have the same eigenvalues, they do not in general share the
same eigenvectors. Let yi be the eigenvector of AT corresponding to eigenvalue
¸i and let xi be the eigenvector of A: Since yi satis…es:
AT yi = ¸i yi
by taking transposes of both sides we obtain:
yiT A = ¸i yiT :
For this reason yiT is referred to as a left-hand eigenvector of A while xi is the
right-hand eigenvector. We thus have:
De…nition 184 The left and right-hand eigenvectors of A corresponding to
eigenvalue ¸i are de…ned respectively as:
yiT A = ¸i yiT
Axi = ¸i xi :
We then have the following result:
Theorem 185 The left and right-hand eigenvectors of a matrix A corresponding to di¤erent eigenvalues are orthogonal to each other.
Proof. Let yj be the left-hand eigenvector corresponding to the eigenvalue
¸j and let xi the right-hand eigenvector corresponding to the eigenvalue ¸i with
¸i 6= ¸j : Then:
yjT A = ¸j yjT =) yjT Axi = ¸j yjT xi
Axi
= ¸i xi =) yjT Axi = ¸i yjT xi
CHAPTER 3. MATRIX ALGEBRA
152
so that:
yjT Axi
=
¸i yjT xi = ¸j yjT xi
=) (¸i ¡ ¸j ) yiT xj = 0
=) yiT xj = 0
where the second last line follows from ¸i 6= ¸j : Since yiT xj = 0 it follows that
yi and xj are orthogonal.
Example: We have seen that above that the matrix:
· 7
¸
¡ 13
3
A=
8
¡ 23
3
has right-hand eigenvectors given by:
· ¸
·
¸
1
1
; ¸2 = 3 $ x2 =
:
¸1 = 2 $ x1 =
1
¡2
The left-hand eigenvectors are the eigenvectors calculated from AT as:
· ¸
·
¸
2
¡1
y1 =
$ ¸1 = 2; y2 =
$ ¸2 = 3:
1
1
since for example:
AT y1 =
·
7
3
¡ 13
¡ 23
8
3
¸·
2
1
¸
=
·
4
2
¸
=2
·
2
1
¸
= 2y1 :
As predicted by the theorem, the eigenvectors x1 and y2 are orthogonal since:
·
¸
£
¤ ¡1
xT1 y2 = 1 1
= 1 £ ¡1 + 1 £ 1 = 0
1
and the eigenvectors x2 and y1 are orthogonal since:
· ¸
£
¤ 2
xT2 y1 = 1 ¡2
= 1 £ 2 + ¡2 £ 1 = 0:
1
3.9.5
Symmetric and Orthogonal Matrices
A nice property for a matrix to have is if its transpose equals its inverse. Such
matrices are called orthogonal matrices:
De…nition 186 If C ¡1 = C T then C is an orthogonal matrix.
Remark 1: If C is orthogonal, then if C is written as a collection of column
vectors as: C = [c1 ; c2 ; : : : cn ] then the columns of C are orthogonalpto each other,
that is cTi cj = 0 for i 6= j and have a unit length; that is kci k = cTi ci = 1:
CHAPTER 3. MATRIX ALGEBRA
153
Remark 2: If C is orthogonal then it preserves length. That is given any n £ 1
vector x and y = Cx we have:
q
p
p
p
p
T
kyk = y y = (Cx)T (Cx) = xT C T Cx = xT C ¡1 Cx = xT x = kxk :
Remark 3: The only scalars that have the property x = x¡1 are 1 and ¡1: An
orthogonal matrix has only eigenvalues equal to 1 and ¡1. This follows since the
eigenvalues of C and C T are the same, and C T = C ¡1 . Since the eigenvalues of
C ¡1 are the inverse of the eigenvalues of C they must satisfy ¸ = ¸¡1 :
It turns out that when a matrix A is symmetric the representation A =
C¤C ¡1 always exists. Furthermore the matrix C is orthogonal; that is: C ¡1 =
C T so that:
Theorem 187 If A is a symmetric matrix then it can be written as:
A = C¤C ¡1 = C¤C T
where C T = C ¡1 is an orthogonal matrix.
Example: The symmetric matrix A given by:
·
¸
5 ¡2
A=
¡2
5
has eigenvalues ¸1 = 3 and ¸2 = 7: The representation A = C¤C ¡1 = C¤C T
takes the form:
"
#·
#
¸ " p1
p1
p1
¡ p12
3 0
2
2
2
A = p1
p1
¡ p12 p12
0 7
2
2
where:
C
¡1
=
"
p1
2
p1
2
¡ p12
p1
2
#
; C=
"
¸
·
p1
2
¡ p12
p1
2
p1
2
#
; ¤=
·
3 0
0 7
¸
:
We then have:
2
A
=
=
·
"
5 ¡2
¡2
5
p1
2
p1
2
¡ p12
p1
2
¸ ·
¸
5 ¡2
29 ¡20
£
=
¡2
5
¡20
29
#·
#
"
¸
1
1
p
p
32 0
2
2
1
2
p
¡ 2 p12
0 7
CHAPTER 3. MATRIX ALGEBRA
154
and:
A
¡1
=
=
·
"
5 ¡2
¡2
5
p1
2
p1
2
¸¡1
¡ p12
p1
2
=) C ¡1 =
"
2
p1
2
p1
2
5
21
2
21
=
#·
3¡1
0
The matrix C is orthogonal since:
"
#
p1
p1
¡
2
2
C =
p1
p1
2
·
¡ p12
p1
2
#¡1
2
21
5
21
0
7¡1
=
"
¸
¸"
p1
2
¡ p12
p1
2
¡ p12
p1
2
p1
2
p1
2
p1
2
#
#
:
= CT :
Note that the columns vectors of C are orthogonal to each other and have a
length of 1.
3.10
Linear and Quadratic Functions in <n+1
In this section we begin our treatment of multivariate functions. This topic will
be treated in more generality in the next chapter. Here we emphasize linear
algebra concepts, in particular the multivariate generalization of the linear and
quadratic functions
3.10.1
Linear Functions
A line in the (x; y) plane, that is in the two dimensional space: <2 ; can be
represented by the linear function y = ax + b: Suppose we try and generalize
this the case where x is a n £ 1 vector. In this case we have the equivalent of a
line in a <n+1 dimensional space with n dimensions for x and 1 dimension for
y:
De…nition 188 A linear function in an n+1 dimensional space <n+1 takes the
form:
y = aT x + b
where a and x are n £ 1 vectors and b is a scalar.
Example: Consider the case where n = 2 and:
¸
·
·
¸
2
x1
a=
; x=
; b = 10
¡3
x2
CHAPTER 3. MATRIX ALGEBRA
155
then the linear function is:
y
= aT x + b
¸
·
£
¤ x1
2 ¡3
=
+ 10
x2
= 2x1 + ¡3x2 + 10:
Since n = 2 this describes a plane in a 2 + 1 = 3 dimensional space <3 with two
dimensions for the x0 s and 1 for y as shown below:
y
x2
x1
:
y = 2x1 + ¡3x2 + 10
3.10.2
Quadratics
Next to the linear function the next most basic function in the (x; y) plane is
the quadratic y = ax2 . Consider the problem of generalizing this to where x is
an n £ 1 vector. If x is a vector we cannot write ax2 since x2 = x £ x is not
de…ned! However if we rewrite this as ax2 = xT ax, and replace a by a n £ n
matrix A; then this is de…ned when we let x be a n £ 1 vector. This turns out
to be the most useful generalization.
De…nition 189 Quadratic Form: If x is an n £ 1 vector and A is a symmetric n £ n matrix then a quadratic form in <n+1 is de…ned as:
xT Ax:
Remark: There is no loss in generality in assuming that¡ A is symmetric
since
¢
if A is not symmetric we could replace A with B = A + AT =2; which is
symmetric and:
¡
¢
¢
xT A + AT x
1¡ T
xT Bx =
=
x Ax + xT AT x = xT Ax
2
2
¡
¢T
since xT Ax is a scalar and consequently xT Ax = xT Ax = xT AT x:
CHAPTER 3. MATRIX ALGEBRA
156
Example 1: If n = 2 then
y
=
xT Ax =
£
x2
x1
¤
·
a12
a22
a11
a12
=) y = x21 a11 + 2x1 x2 a12 + x22 a22 :
¸·
x1
x2
¸
¸·
x1
x2
¸
Example 2: If n = 2 and
A=
y
=
xT Ax =
£
·
2 ¡1
¡1
3
x1
x2
¤
·
¸
2 ¡1
¡1
3
=) y = 2x21 ¡ 2x1 x2 + 3x22 :
This quadratic form describes a valley-like quadratic in 3 dimensional space as
shown below:
y
x2
x1
:
y=
2x21
¡ 2x1 x2 + 3x22
Now to generalize from y = ax2 + bx + c we replace ax2 with the quadratic
form xT Ax and we replace the linear function bx + c with its multivariate generalization: bT x + c to obtain:
De…nition 190 Quadratic: A quadratic in <n+1 takes the form:
y = xT Ax + bT x + c
where A is a symmetric n £ n matrix, b and x are n £ 1 column vectors and y
and c are scalars.
Example 1: If n = 2 and
·
A=
2 ¡1
¡1
3
¸
; b=
·
4
5
¸
; c = 10
CHAPTER 3. MATRIX ALGEBRA
157
then
y
= xT Ax + bT x + c
=) y = 2x21 ¡ 2x1 x2 + 3x22 + 4x1 + 5x2 + 10:
This describes a valley-like quadratic in 3 dimensional space or <3 as shown in
the plot below:
y
x2
x1
:
y=
2x21
¡ 2x1 x2 + 3x22
+ 4x1 + 5x2 + 10
Example 2: If we replace A with ¡A in Example 1 then the quadratic becomes:
y = ¡2x21 + 2x1 x2 ¡ 3x22 + 4x1 + 5x2 + 10
which describes a mountain-like function in three dimensions or <3 :
y
x2
x1
:
y = ¡2x21 + 2x1 x2 ¡ 3x22 + 4x1 + 5x2 + 10
3.10.3
Positive and Negative De…nite Matrices
One of the reasons quadratics are so important is their close relationship with
the notions of concavity and convexity. For the ordinary quadratic f (x) = ax2
CHAPTER 3. MATRIX ALGEBRA
158
if a > 0 then f 00 (x) = 2a > 0 and f (x) is globally convex while if a < 0 then
f 00 (x) = 2a < 0 and f (x) is globally concave.
Now instead of ax2 a multivariate quadratic takes the form f (x) = xT Ax:
It turns out that if A > 0; or A is positive de…nite, then f (x) is convex. In
Example 1 above it turns out that A > 0 and from the plot it appears that f (x)
is indeed valley-like or convex. Similarly if A < 0; or A is negative de…nite, then
f (x) is concave or mountain-like. In Example 2 above it turns out that A < 0
and from the plot it appears that f (x) is indeed mountain-like or concave.
Now it is not obvious what we mean if we write A > 0 or A ¸ 0 or A < 0 or
A · 0 when A is a symmetric matrix. The key to extending these inequalities
to matrices is the quadratic form. If we write for a scalar a that the a ¸ 0; this
is equivalent to saying that ax2 ¸ 0 for all x (since x2 ¸ 0 ). Similarly if we
say a > 0 this is equivalent to saying that ax2 > 0 for all x 6= 0: In general we
have:
¢
¡
a > 0 () ax2 > 0 for all x 6= 0
¢
¡
a ¸ 0 () ax2 ¸ 0 for all x
¢
¡
a < 0 () ax2 < 0 for all x 6= 0
¢
¡
a · 0 () ax2 > 0 for all x
To generalize to matrices we replace ax2 with the quadratic form: xT Ax: Thus
A ¸ 0 if xT Ax ¸ 0 for all x: This leads to the following de…nitions where x is
an n £ 1 vector:
De…nition 191 We say that A is positive de…nite or A > 0 if and only if
xT Ax > 0
for all x except x = 0:
De…nition 192 We say that A is positive semi-de…nite or A ¸ 0 if and
only if
xT Ax ¸ 0
for all x:
De…nition 193 We say that A is negative de…nite or A < 0 if and only if
xT Ax < 0
for all x except x = 0:
De…nition 194 We say that A is negative semi-de…nite or A · 0 if and
only if
xT Ax · 0
for all x:
CHAPTER 3. MATRIX ALGEBRA
159
Remark 1: If x = 0 then xT Ax = 0 no matter what A is. This is the reason why
this case is exclude in the de…nitions of positive and negative de…nite matrices.
Remark 2: If A is positive (negative) de…nite it follows from the de…nition
that the quadratic xT Ax has a unique global minimum (maximum) at x¤ = 0:
This is because y = xT Ax is describes an n+1 dimensional valley (mountain) or
convex (concave) function with x¤ = 0 the minimum (maximum) of the valley
(mountain). As we shall see later, it is a matrix of second derivatives (called
the Hessian) being positive or negative de…nite which determines whether any
function in <n+1 is concave or convex.
Example 1: Consider the case where n = 2 and
·
¸
2 ¡5
A=
¡5 13
in which case:
T
x Ax =
£
x1
x2
¤
·
2 ¡5
¡5 13
= 2x21 ¡ 10x1 x2 + 13x22 :
¸·
x1
x2
¸
We can show that A is positive semi-de…nite or A ¸ 0 since for all x1 ; x2 :
xT Ax = 2x21 ¡ 10x1 x2 + 13x22
= (x1 ¡ 2x2 )2 + (x1 ¡ 3x2 )2 ¸ 0
since the sum of two squares can never be negative (you can verify that the
second line is correct by expanding the two terms and showing it is equal to the
previous expression). We therefore conclude that for all x :
xT Ax ¸ 0
and so by de…nition A is positive semi-de…nite.
We can however prove more, that in fact A is positive de…nite. Suppose
xT Ax = 0: This could only occur if (x1 ¡ 2x2 )2 = 0 and (x1 ¡ 3x2 )2 = 0 which
in turn implies that x1 = 2x2 and x1 = 3x2 which can only occur if x1 = x2 = 0
since otherwise:
x1 = 2x2 and x1 = 3x2 =) 2x2 = 3x2 =) 2 = 3
which is a contradiction. Thus xT Ax = 0 only occurs when x = 0 so that:
xT Ax > 0 for all x except x = 0
and so by de…nition the matrix A is positive de…nite.
CHAPTER 3. MATRIX ALGEBRA
160
Example 2: Consider the case where n = 2 and
·
¸
¡2
2
A=
2 ¡2
in which case:
T
x Ax =
£
x1
x2
¤
·
¡2
2
2 ¡2
= ¡2x21 + 4x1 x2 ¡ 2x22 :
¸·
x1
x2
¸
We can show that A is negative semi-de…nite or A · 0 since for all x1 ; x2 :
xT Ax = ¡2x21 + 4x1 x2 ¡ 2x22
= ¡2(x1 ¡ x2 )2 · 0:
We can cannot prove that A < 0 or that A is negative de…nite since there
does exist and x 6= 0 such that xT Ax = 0: In particular if x1 = x2 = 1 then:
xT Ax = ¡2(1 ¡ 1)2 = 0
and so A is not negative de…nite.
The inequality a > 0 (a < 0) is a strong inequality while a ¸ 0 (a · 0) is
a weak inequality. This is because if a > 0 (a < 0) then it follows immediately
that a ¸ 0 (a · 0) but one cannot conclude from a ¸ 0 (a · 0) that a > 0
(a < 0) (since if a ¸ 0 it is possible that a = 0 in which case a > 0 would be
false). Thus knowing that a > 0 is a stronger result than knowing a ¸ 0 just as
knowing that a ¸ 0 is a weaker result than knowing that a > 0:
These same relationships hold also for matrices. In particular:
Theorem 195 If A > 0 then A ¸ 0 so that a positive de…nite matrix is always
positive semi-de…nite; but a positive semi-de…nite matrix is not necessarily positive de…nite.
Theorem 196 If A < 0 then A · 0 so that a negative de…nite matrix is always
negative semi-de…nite, but a negative semi-de…nite matrix is not necessarily
negative de…nite.
Example: In Example 2 above the matrix:
·
¸
¡2
2
2 ¡2
is A is negative semi-de…nite and so A · 0 but it is not negative de…nite.
CHAPTER 3. MATRIX ALGEBRA
161
De…niteness and the existence of A¡1
Recall that for scalars if a number a satis…es a ¸ 0 but not a > 0 then it must
be that a = 0: For matrices if A ¸ 0 but not A > 0; (A is positive semi-de…nite
but not positive de…nite), then it does not follow that A = 0; that is that all
elements of A are zero. However it does follow that A has certain zero-like
properties, in particular that its determinant is 0 and hence it does not have an
inverse. This is summarized below:
Theorem 197 If A > 0 so that A is positive de…nite then it is non-singular or
A¡1 exists.
Theorem 198 If A ¸ 0 or A is positive semi-de…nite but A is not positive
de…nite then A is singular or A¡1 does not exist.
Theorem 199 If A < 0 so that A is negative de…nite then it is non-singular
or A¡1 exists.
Theorem 200 If A · 0 or A is negative semi-de…nite but A is not negative
de…nite then A is singular and A¡1 does not exist.
Example: For the two matrices:
·
¸ ·
¸
2 ¡5
¡2
2
;
:
¡5 13
2 ¡2
The …rst we showed is positive de…nite while the second is negative semi-de…nite
but not negative de…nite. Note that
·
¸
·
¸
2 ¡5
¡2
2
det
= 1; det
=0
¡5 13
2 ¡2
so that the second matrix is singular and so does not have an inverse while the
…rst is non-singular and has an inverse.
A positive de…nite matrix must have positive diagonal elements, but the
o¤-diagonal elements can be either positive or negative. In general we have:
Theorem 201 If A > 0 or A is positive de…nite all diagonal elements must be
greater than 0 (or aii > 0 for i = 1; 2; : : : n: )
Theorem 202 If A ¸ 0 or A is positive semi-de…nite all diagonal elements
must be greater than or equal to 0 (or aii ¸ 0 for i = 1; 2; : : : n: )
Theorem 203 If A < 0 or A is negative de…nite all diagonal elements must be
less than 0 (or aii < 0 for i = 1; 2; : : : n: )
Theorem 204 If A · 0 or A is negative semi-de…nite all diagonal elements
must be less than or equal to 0 (or aii · 0 for i = 1; 2; : : : n: )
CHAPTER 3. MATRIX ALGEBRA
162
Remark 1: The signs of the diagonal elements provide necessary conditions
but not su¢cient conditions. For example it turns out that the matrix:
·
¸
1 4
4 2
is not positive de…nite even though the diagonal elements are both positive.
For the matrix:
·
¸
1
1
A=
1 ¡2
we can immediately conclude that it is not positive de…nite (nor positive semide…nite) since it has a negative diagonal element ¡2: We can also conclude
that it is not negative de…nite ( nor negative semi-de…nite) since it also has
a positive diagonal element: Note that this last example shows that while for
ordinary scalars it is always the case that either a ¸ 0 or a · 0; this is not true
for matrices; that is for A above it is not the case that A ¸ 0 and it is not the
case that A · 0 .
As usual, it is much easier to analyze diagonal matrices for de…niteness. We
have:
Theorem 205 If A is a diagonal matrix then A > 0 or A is positive de…nite
if and only if all diagonal elements are greater than 0 (or aii > 0 for i =
1; 2; : : : n:)
Theorem 206 If A is a diagonal matrix then A ¸ 0 or A is positive semide…nite if and only if all diagonal elements are greater than or equal to 0 (or
aii ¸ 0 for i = 1; 2; : : : n:)
Theorem 207 If A is a diagonal matrix then A < 0 or A is negative de…nite
if and only if all diagonal elements are less than 0 (or aii < 0 for i = 1; 2; : : : n:)
Theorem 208 If A is a diagonal matrix then A · 0 or A is negative semide…nite if and only if all diagonal elements are less than or equal to 0 (or aii · 0
for i = 1; 2; : : : n: )
Example: The diagonal matrices:
·
¸ ·
¸ ·
¸ ·
¸
1 0
1 0
¡1
0
¡1 0
;
;
;
0 2
0 0
0 ¡2
0 0
are respectively positive de…nite, positive semi-de…nite, negative de…nite and
negative semi-de…nite. The diagonal matrix:
·
¸
1
0
0 ¡2
is neither positive de…nite, positive semi-de…nite, negative de…nite nor negative
semi-de…nite
CHAPTER 3. MATRIX ALGEBRA
163
The Negative of a Positive De…nite Matrix
With scalars if you multiply a positive number by ¡1 you get a negative number.
The same result holds for matrices, if you multiply a positive de…nite matrix by
¡1 you get a negative de…nite matrix. In general
Theorem 209 A is positive de…nite (or A > 0 ) if and only if ¡A is negative
de…nite (¡A < 0).
Theorem 210 A is positive semi-de…nite (or A ¸ 0) if and only if ¡A is
negative semi-de…nite (or ¡A · 0 ).
This means that if you have say a positive de…nite matrix, then you can …nd
another negative de…nite matrix by multiplying all elements by ¡1:
Example: Since we have already seen that:
·
¸
2 ¡5
¡5 13
is positive de…nite it follows that:
·
¡2
5
5 ¡13
¸
is negative de…nite.
The Form A = B T B
Recall that if a scalar is of the form a = b2 then we immediately obtain the weak
inequality: a ¸ 0 since a square is always non-negative. Given the additional
information b 6= 0 we can obtain the strong inequality: a > 0: Now suppose
that the matrix A takes a similar form: A = B T B: We then obtain the weak
inequality: A ¸ 0 or that A is positive semi-de…nite. Furthermore by restricting
B we obtain a strong inequality that A > 0 or that A is positive de…nite. This
result turns out to be quite important in econometrics.
Theorem 211 If B is an m £ n matrix and A = B T B then A ¸ 0 or A is
positive semi-de…nite.
Theorem 212 If B is an m £ n matrix with rank [B] = n then A > 0 or
A = B T B is positive de…nite.
Proof. If A = B T B then de…ne y as y = Bx: Now for all x:
xT Ax =
=
=
=
xT B T Bx
(Bx)T Bx
yT y
kyk2 ¸ 0
CHAPTER 3. MATRIX ALGEBRA
164
Now if rank [B] = n then y = Bx = 0 =) x = 0: Therefore for x 6= 0 it follows
2
that y 6= 0 so that xT Ax = kyk > 0 and hence A is positive de…nite.
Example: Given:
2
3
3 4
B = 4 ¡2 1 5
6 2
you can verify that rank [B] = 2; that is the two columns of B are linearly
independent. Thus Theorem 212 the matrix A = B T B given by:
2
3
·
¸
·
¸
3 4
3 ¡2 6 4
49 22
¡2 1 5 =
BT B =
4
1 2
22 21
6 2
is positive de…nite.
3.10.4
Using Determinants to Check for De…niteness
We can use determinants to check whether a matrix is positive or negative
de…nite. The key concept is the leading principal minor de…ned as:
De…nition 213 Leading Principal Minors: The ith leading principal minor
of the n £ n matrix A is given by: Mi = det [Aii ] where Aii is the i £ i matrix
obtained from the …rst i rows and columns of A:
Example: If A is given by:
2
3
3 1 2
A=4 1 6 3 5
2 3 8
we have 3 leading principal minors:
M1
M2
M3
We have:
= det [3] = 3
·
3 1
= det
1 6
2
3 1
= det 4 1 6
2 3
¸
= 17
3
2
3 5 = 97
8
Theorem 214 The matrix A is positive de…nite if and only if all the leading
principal minors are strictly positive; that is: M1 > 0; M2 > 0; : : : Mn > 0:
CHAPTER 3. MATRIX ALGEBRA
165
Theorem 215 The matrix A is negative de…nite if and only if all the principal
minors alternative in sign with the …rst being negative or: M1 < 0; M2 >
0; M3 < 0; : : : :
Example 1: For a general 2 £ 2 matrix
·
a11
A=
a12
a12
a22
¸
to be positive de…nite we require that M1 = a11 > 0 and M22 = det [A] =
a11 a22 ¡ a212 > 0: The last result implies that:
p
ja12 j < a11 a22 :
Thus in addition to the diagonal elements being positive, we require that the
o¤-diagonal elements not be too large in absolute value relative to the diagonal
elements. Thus the matrices:
·
¸ ·
¸
1 4
1 ¡4
;
4 2
¡4
2
are both not positive de…nite since the o¤-diagonal element,
here either 4 or
p
¡4; is too large relative to the diagonal elements or 4 > 1 £ 2. Thus so that
for both matrices: M1 = 1 > 0 but M2 = ¡14 < 0 and so neither matrix is
positive de…nite.
Example 2: The matrix we considered
2
3
A=4 1
2
above:
3
1 2
6 3 5
3 8
is positive de…nite since: M1 = 3 > 0; M2 = 17 > 0 and M3 = 97 > 0:
Example 3: If
2
3
¡3 ¡1 ¡2
A = 4 ¡1 ¡6 ¡3 5
¡2 ¡3 ¡8
then:
M1
M2
M3
= det [¡3] = ¡3 < 0
·
¸
¡3 ¡1
= det
= 17 > 0
¡1 ¡6
2
3
¡3 ¡1 ¡2
= det 4 ¡1 ¡6 ¡3 5 = ¡97 < 0:
¡2 ¡3 ¡8
CHAPTER 3. MATRIX ALGEBRA
166
Since the leading principal minors alternate in sign with M1 < 0 it follows that
A is negative de…nite.
Example 4: Given:
2
3
3 4
B = 4 ¡2 1 5
6 2
you can verify that rank [B] = 2; that is
independent. Thus Theorem 212 predicts
verify this note that:
2
·
¸
3
3
¡2
6
4 ¡2
BT B =
4
1 2
6
the two columns of B are linearly
that B T B is positive de…nite. To
3
·
¸
4
49 22
5
1 =
22 21
2
which is positive de…nite since from the leading principal minors: M1 = 49 > 0
and M2 = 545 > 0:
Remark 1: At …rst it is tricky to remember the condition for A to be negative
de…nite since intuitively one would think that all leading principal minors must
be negative. It may help you to remember the rule if you consider a diagonal
matrix with negative elements, which we know is negative de…nite. For example:
2
3
¡1
0
0
4 0 ¡2
0 5:
0
0 ¡3
Here M1 = ¡1 < 0 but M2 = ¡1 £ ¡2 = 2 > 0 because the product of two
negative numbers is positive. Finally M3 = ¡1 £ ¡2 £ ¡3 = ¡6 < 0: This is
why Mi must alternate in sign for negative de…nite matrices.
Remark 2: We cannot easily extend these criteria to positive semi-de…nite and
negative semi-de…nite matrices. For example it does not follow that if M1 ¸ 0;
M2 ¸ 0; : : : that A is positive semi-de…nite. For example the matrix:
2
3
1 0
0
4 0 0
0 5
0 0 ¡1
has M1 = 1 ¸ 0; M2 = 0 ¸ 0 and M3 = 0 ¸ 0 but this matrix is not positive
semi-de…nite since it has a negative diagonal element.
3.10.5
Using Eigenvalues to Check for De…niteness
An alternative method for testing for de…niteness is to use eigenvalues. We
have:
CHAPTER 3. MATRIX ALGEBRA
167
Theorem 216 If A is a symmetric n £ n matrix with eigenvalues: ¸i ; i =
1; 2; : : : n then:
1. A is positive de…nite ( A > 0 ) if and only if: ¸i > 0 for i = 1; 2; : : : n:
2. A is positive semi-de…nite ( A ¸ 0 ) if and only if: ¸i ¸ 0 for i = 1; 2; : : : n:
3. A is negative de…nite ( A < 0 ) if and only if: ¸i < 0 for i = 1; 2; : : : n:
4. A is negative semi-de…nite ( A · 0 ) if and only if: ¸i · 0 for i = 1; 2; : : : n:
Proof. We prove only 1: and 2: but 3: and 4: follow the same reasoning.
Given that A is symmetric we have A = C¤C T where ¤ is a diagonal matrix
with the eigenvalues of A along its diagonal and C is an orthogonal matrix so
that C T = C ¡1 . We then have:
xT Ax = xT C¤C T x
= y T ¤y
= y12 ¸1 + y22 ¸2 + ¢ ¢ ¢ + yn2 ¸n
where y = C T x is an n £ 1 vector. Since ¤ is a diagonal matrix it follows that
xT Ax = yT ¤y ¸ 0 if and only ¸i ¸ 0 for all i and so 2: follows. Since C T is
non-singular, it follows that x = 0 if and only if y = 0 so that xT Ax = y T ¤y > 0
for x 6= 0 if and only if ¸i > 0 for all i and so 1: follows.
Example 1: If A is given by:
A=
·
2 ¡5
¡5 13
¸
we …nd that the eigenvalues are
¸1
=
¸2
=
15 1 p
+
221 = 14:933 > 0
2
2
15 1 p
¡
221 = 0:0066966 > 0:
2
2
It follows that A is positive de…nite.
Example 2: If A is given by:
A=
·
¡2
2
2 ¡2
¸
we …nd that the eigenvalues are
¸1
¸2
= ¡4 < 0
= 0:
CHAPTER 3. MATRIX ALGEBRA
168
Since all the eigenvalues satisfy ¸i · 0 it follows that A is negative semi-de…nite.
However since ¸2 = 0 it follows that A is not negative de…nite.
Example 3: If A is given by:
A=
·
1 4
4 2
¸
we …nd that the eigenvalues are
3 1p
+
65 = 5:531 > 0
2 2
3 1p
¸2 =
¡
65 = ¡2:531 < 0:
2 2
Since the eigenvalues have the opposite sign, the matrix A is neither positive
de…nite, positive semi-de…nite, negative de…nite nor negative semi-de…nite.
¸1
3.10.6
=
Maximizing and Minimizing Quadratics
Consider the problem of maximizing (minimizing) the quadratic:
y = xT Ax + bT x + c:
where x is an n £1 vector, A is a symmetric n £ n matrix, and b is a n £ 1 vector
and c is a scalar. We also assume that A is negative de…nite which implies that
A¡1 exists. In the next chapter we will see how to solve this problem using
multivariate calculus.
It is possible to …nd x¤ without calculus using a technique called completing
the square. We have:
Theorem 217 The value of x which maximizes (minimizes)
y = xT Ax + bT x + c:
where A is negative (positive) de…nite is:
1
x¤ = ¡ A¡1 b:
2
Proof. Completing the square amounts to showing that:
xT Ax + bT x + c = (x ¡ x¤ )T A (x ¡ x¤ ) + c ¡
bT A¡1 b
:
4
You can verify this as follows:
(x ¡ x¤ )T A (x ¡ x¤ ) = xT Ax ¡ x¤T Ax ¡ xT Ax¤ + x¤T Ax¤
1
1
1
= xT Ax + bT A¡1 Ax + xT AA¡1 b + bT A¡1 AA¡1 b
2
2
4
1 T
1 T
1 T ¡1
T
= x Ax + b x + x b + b A b
2
2
4
1 T ¡1
T
T
= x Ax + b x + b A b
4
CHAPTER 3. MATRIX ALGEBRA
169
since bT x = xT b: Thus:
1
xT Ax + bT x = (x + x¤ )T A (x + x¤ ) ¡ bT A¡1 b
4
from which it follows that:
y = (x ¡ x¤ )T A (x ¡ x¤ ) + c ¡
bT A¡1 b
:
4
If A is negative de…nite then it follows that:
(x ¡ x¤ )T A (x ¡ x¤ ) < 0 for (x ¡ x¤ ) 6= 0
(x ¡ x¤ )T A (x ¡ x¤ ) = 0 for (x ¡ x¤ ) = 0
and hence:
y ·c¡
bT A¡1 b
4
with equality only when x = x¤ : If A is positive de…nite then replace < with
> above and the result follows. It follows then that x¤ is a global maximum
(minimum).
Example 1: If n = 2 and
·
A=
2 ¡1
¡1
3
¸
; b=
·
4
5
¸
; c = 10
then
y
= xT Ax + bT x + c
=) y = 2x21 ¡ 2x1 x2 + 3x22 + 4x1 + 5x2 + 10:
You can check that the matrix A is positive de…nite since M1 = 2 > 0 and
M2 = 5 > 0 so we will look for a minimum of the quadratic plotted below:
y
x2
x1
:
y = 2x21 ¡ 2x1 x2 + 3x22 + 4x1 + 5x2 + 10
CHAPTER 3. MATRIX ALGEBRA
170
We have:
1
1
x = ¡ A¡1 b = ¡
2
2
¤
·
2 ¡1
¡1
3
¸¡1 ·
4
5
¸
=
·
¡ 17
10
¡ 75
¸
7
¤
so that the global minimum occurs at x¤1 = ¡ 17
10 and x2 = ¡ 5 :
Example 2: The linear regression model is:
Y = X¯ + e
where Y is an n £ 1 vector, X is an n £ p matrix of rank p, and e is an n £ 1
^ is the value of ¯ which
vector of random errors. The least squares estimator ¯
minimizes the sum of squares function:
S (¯) = (Y ¡ X¯)T (Y ¡ X¯)
= ¯ T X T X¯ ¡ 2Y T X¯ + Y T Y:
Although the notation may cover this up, in fact S (¯) is a quadratic. Here
^ A = X T X; b = ¡2X T Y and c = Y T Y . If rank [X] = p then A
x is ¯; x¤ is ¯;
is positive de…nite by Theorem 212 and so making the translation in notation
from
1
x¤ = ¡ A¡1 b
2
we …nd that:
^
¯
1
= ¡ A¡1 b
2
=A
z }| {z =b
}| ¢{
1 ¡ T ¢¡1 ¡
¡2X T Y
= ¡ X X
2
¡ T ¢¡1 T
X Y:
= X X
¢
¡
^ = X T X ¡1 X T Y is one of the central results in econometrics.
The formula ¯
Example 3: Suppose one has data on 10 families where Yi is the consumption
of family i; Xi1 is the income of family i and Xi2 is the wealth of family i and
suppose that:
Yi = Xi1 ¯ 1 + Xi2 ¯ 2 + ei for i = 1; 2; : : : 10:
The parameter ¯ 1 is the marginal propensity to consume out of income while
¯ 2 is the marginal propensity to consume out of wealth while ei is a random
error.
CHAPTER 3. MATRIX ALGEBRA
Suppose that the actual data takes the form:
2
3
2
10
22:1
6 25
6 88:2 7
6
7
6
6 7
6 7:8 7
6
7
6
6 41
6 29:2 7
6
7
6
6 10
6 8:8 7
6
7
6
Y =6
7 ; X = 6 75
217:4
6
7
6
6 21
6 35:1 7
6
7
6
6 77
6 61:9 7
6
7
6
4 21
4 11:4 5
71
52:7
171
100
355
10
50
3
860
62
107
10
45
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
so that for example family 1’s consumption is 22:1; their income is 10 and their
wealth is 100: From this data we wish to estimate ¯ 1 and ¯ 2 using the least
¡
¢
^ = X T X ¡1 X T Y .
squares estimator: ¯
We have:
3
2
10 100
6 25 355 7
7
6
6 7 10 7
7
6
7
6
·
¸ 6 41 50 7
6
10 25 7 41 10 75 21 77 21 71 6 10 3 7
7
XT X =
7
100 355 10 50 3 860 62 107 10 45 6
6 75 860 7
6 21 62 7
7
6
6 77 107 7
7
6
4 21 10 5
71 45
·
¸
20032 89471
=
89471 895652
and:
2
XT Y
6
6
6
6
6
·
¸6
10 25 7 41 10 75 21 77 21 71 6
6
=
100 355 10 50 3 860 62 107 10 45 6
6
6
6
6
6
4
=
·
29555:3
233334:4
¸
:
22:1
88:2
7:8
29:2
8:8
217:4
35:1
61:9
11:4
52:7
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
CHAPTER 3. MATRIX ALGEBRA
172
^ is given by:
Thus the least squares estimator ¯
¢
¡
^ = X T X ¡1 X T Y
¯
·
¸¡1 ·
¸
20032 89471
29555:3
=
89471 895652
233334:4
·
¸
0:56
=
:
0:20
^ 1 = 0:56
The estimated marginal propensity to consume out of income is then: ¯
^
while the estimated marginal propensity to consume out of wealth is: ¯ 2 = 0:20:
3.11
Idempotent Matrices
There are only two scalars that have the property a2 = a : 02 = 0 and 12 = 1:
An idempotent matrix is like 0 or 1 in that it also has this property. Thus:
De…nition 218 Idempotent Matrix: An n £ n matrix P is said to be idempotent if
P P = P:
^ =
Example 1: Recall from the linear regression model Y = X¯ + e that ¯
¡ T ¢¡1 T
X X
X Y: The vector of …tted values, or the values of Y predicted by the
estimated model are given by:
Y^
^
= X¯
¢¡1 T
¡
X Y
= X XT X
= PY
where P is given by:
P is idempotent since:
PP
¢¡1 T
¡
X :
P = X XT X
¡
¢¡1 T ¡ T ¢¡1 T
= X XT X
X
X
| {zX} X X
¡
T
¢¡1
=I
= XI X X
XT
¡
¢¡1 T
= X XT X
X = P:
The least squares residual, or the part of Y that the model cannot explain,
is given by:
e^ ´ Y ¡ Y^
= Y ¡ PY
= (I ¡ P ) Y:
CHAPTER 3. MATRIX ALGEBRA
173
The matrix I ¡ P is also idempotent since:
(I ¡ P ) (I ¡ P ) = I ¡ IP ¡ P I + P P
= I ¡P ¡P +P
= I ¡ P:
Since P P = P it can further be shown that
P (I ¡ P ) = P ¡ P P = P ¡ P = 0:
Example 2: If
2
3
1
X =4 1 5
1
then in econometrics this would correspond to the regression: Yi = ¹ + ei with
3 observations and a constant term.
To calculate P for this X note that:
2 3
1
£
¤
X T X = 1 1 1 4 1 5 = 3:
1
Thus:
P
¢¡1 T
¡
X
= X XT X
2 3
1
£
¤
= 4 1 5 3¡1 1 1 1
1
2
3
1 1 1
14
1 1 1 5
=
3
1 1 1
2 1 1 1 3
= 4
3
1
3
1
3
3
1
3
1
3
3
1
3
1
3
5:
We know P is idempotent but you might want to check this by multiplying P by
itself.
The idempotent matrix I ¡ P is then:
3
2
3 2 1 1 1 3 2 2
¡ 13 ¡ 13
1 0 0
3
3
3
3
2
4 0 1 0 5 ¡ 4 1 1 1 5 = 4 ¡1
¡ 13 5 :
3
3
3
3
3
1
1
1
2
1
1
¡3 ¡3
0 0 1
3
3
3
3
CHAPTER 3. MATRIX ALGEBRA
3.11.1
174
Important Properties of Idempotent Matrices
Idempotent matrices have the following properties:
Theorem 219 If P is idempotent then:
1. The eigenvalues of P are all either 0 or 1.
2. If P is symmetric then it is positive semi-de…nite.
3. tr [P ] = rank [P ].
4. If x and y are two n £ 1 vectors and if P is symmetric then w = P x and
z = (I ¡ P ) y are orthogonal; that is wT z = 0:
Proof. If P is idempotent then an eigenvector x 6= 0 and eigenvalue ¸
satisfy:
P x = ¸x:
Multiplying both sides by P we …nd that:
PPx
= ¸P x =) ¸x = ¸2 x
=) ¸ (1 ¡ ¸) x = 0
since: P x = ¸x and P P x = P x = ¸x. Now since x 6= 0 it follows that
¸ (1 ¡ ¸) = 0 so that: ¸ = 0 or ¸ = 1: Since ¸ = 0 or ¸ = 1 it follows that ¸ ¸ 0
and so P is positive semi-de…nite. Since tr [P ] is the sum of the eigenvalues
which equals the number of eigenvalues equal to 1 which is the rank of P:
If P is symmetric or P = P T then and w = P x and z = (I ¡ P ) y then:
wT z
= (P x)T (I ¡ P ) y = xT P T (I ¡ P ) y
= xT P (I ¡ P ) y = xT (P ¡ P P ) y = xT (P ¡ P ) y
= xT 0y = 0
and so w and z are orthogonal.
Example 1: Consider the idempotent matrices: P and I ¡ P from the previous
example where:
2 1 1 1 3
2 2
3
¡ 13 ¡ 13
3
3
3
3
2
¡ 13 5 :
P = 4 31 31 13 5 ; I ¡ P = 4 ¡ 13
3
1
1
1
2
1
1
¡3 ¡3
3
3
3
3
The eigenvalues of P are determined from the characteristic polynomial:
2 1
3
1
1
3 ¡¸
3
3
1
1
5 = ¸2 ¡ ¸3
f (¸) = det [P ¡ ¸I] = det 4 13
3 ¡¸
3
1
1
1
3
3
3 ¡¸
= ¸2 (¸ ¡ 1) = 0
CHAPTER 3. MATRIX ALGEBRA
175
and so the eigenvalues are: ¸1 = 1; ¸2 = 0 and ¸3 = 0:
Since the eigenvalues satisfy ¸ ¸ 0 it follows that P is positive semi-de…nite.
Since two of the eigenvalues are 0 however P is not positive de…nite and hence
P ¡1 does not exist.
The trace of P is given by:
1 1 1
+ + =1
3 3 3
which is the rank of P ( i.e., P only has one linearly independent column).
The trace of I ¡ P is:
2 2 2
tr [I ¡ P ] = + + = 2
3 3 3
which is also equal to rank [I ¡ P ] :
Let us take any 3 £ 1 vector x; say:
2 3
1
x=4 2 5
3
tr [P ] =
and multiply x by P and I ¡ P to obtain:
2 1 1 1 32
1
3
3
3
w = P x = 4 13 31 31 5 4 2
1
1
1
3
3
23 32
1
¡3
3
2
z = (I ¡ P ) x = 4 ¡ 13
3
1
¡ 3 ¡ 13
3
2
3
2
5=4 2 5
2
32 3 2
3
1
1
¡1
¡3
¡ 13 5 4 2 5 = 4 0 5 :
2
3
1
3
Note that w and z are orthogonal since:
2
3
¡1
£
¤
wT z = 2 2 2 4 0 5 = 2 £ ¡1 + 2 £ 0 + 2 £ 1 = 0:
1
Example 2: The vector of …tted values Y^ and the vector of least squares
residuals are given by:
Y^ = P Y; e^ = (I ¡ P ) Y
¢¡1 T
¡
X : Earlier we showed that tr [P ] = p and so the rank
where P = X X T X
of the n £ n matrix P is p: It follows that the two vectors are orthogonal from
the above theorem or:
Y^ T e^ = Y T
= YT
= YT
(I ¡ P )T P Y
(I ¡ P ) P Y
¡
¢
P ¡ P2 Y
= Y T 0Y
= 0:
CHAPTER 3. MATRIX ALGEBRA
176
Since Y = Y^ + e^ where Y^ and e^ are orthogonal there exists from Theorem
155 a Pythagorean relationship exists:
Y T Y = Y^ T Y^ + e^T e^:
Dividing both sides by Y T Y we obtain:
1=
Y^ T Y^
e^T e^
+
:
Y TY
Y TY
The …rst term on the right is the uncentered R2 de…ne by:
° °2
°^°
T ^
°Y °
^
Y Y
2
R = T =
Y Y
kY k2
which measures the percentage variation in Y explained by the regression model.
Alternatively from De…nition 152 the angle between Y and Y^ is:
Y^ T Y
° °
° °
kY k °Y^ °
³
´
Y^ T Y^ + e^
° °
=
° °
kY k °Y^ °
° °
° °2
°^°
°^°
°Y °
°Y °
° °=
=
° °
kY k
kY k °Y^ °
p
R2 :
=
cos (µ) =
Basically the closer R2 is to 1 the smaller the angle between Y and Y^ the more
the model explains Y:
Now since:
° °2
°^°
°Y °
k^
ek2
2
=
1
¡
:
R =
kY k2
kY k2
it follows that:
0 · R2 · 1:
^ = 0 (the model
You might want to try and show that R2 = 0 if and only if ¯
2
explains nothing) and R = 1 if and only if e^ = 0 (the model is a perfect …t).
3.11.2
The Spectral Representation
Closely related to the representation A = C¤C ¡1 is the spectral representation.
We have:
CHAPTER 3. MATRIX ALGEBRA
177
Theorem 220 The Spectral Representation: Given an n £ n matrix A
written as A = C¤C ¡1 then
A = ¸1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn
where the n £ n matrices Pi given by:
Pi =
xi yiT
xTi yi
and where xi and yi are the right and left-hand eigenvectors corresponding to
the eigenvalue ¸i of A: The matrices Pi are idempotent and orthogonal to each
other in that Pi £ Pi = Pi and Pi Pj = 0 for i 6= j:
Remark: That the matrices Pi are idempotent follows from the fact that:
Pi £ Pi
=
x yT
x yT
¡ iT i ¢ £ ¡ iT i ¢
xi yi
xi yi
a scalar
=
z }| {
xi yiT xi yiT
¡ T ¢2
xi yi
=xT
i yi
=
=
z }| {
yiT xi xTi yiT
¡ T ¢2
xi yi
x yT
¡ iT i ¢ = Pi :
xi yi
That they are orthogonal follows from Theorem 185 that left and right-hand
eigenvectors from di¤erent eigenvalues are orthogonal or: xTi yj = 0 for i 6= j:
Thus:
Pi £ Pj
=
x yT
x yT
¡ iT i ¢ £ ¡ iT i ¢
xi yi
xi yi
=0
z }| {
xi yiT xi yiT
= ¡
¢2
xTi yi
= 0:
An implication of the spectral representation then is that:
Theorem 221 Given the spectral representation for the n £ n matrix A :
A = ¸1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn
CHAPTER 3. MATRIX ALGEBRA
178
then the mth power of A is given by:
m
m
Am = ¸m
1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn :
Proof. We will prove this by induction. Note that it is obviously true for
m = 1: Now suppose it is true for m ¡ 1: We then have:
Am
= Am¡1 £ A
¢
¡
= ¸m¡1
P1 + ¸m¡1
P2 + ¢ ¢ ¢ + ¸m¡1
Pn £ (¸1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn ) :
1
2
n
Since Pi Pj = 0, any cross-product terms drop out and hence:
¢
¢
¢
¡
¡
¡
Am = ¸m¡1
£ ¸1 P1 £ P1 + ¸m¡1
£ ¸2 P2 £ P2 + ¢ ¢ ¢ + ¸m¡1
£ ¸n Pn £ Pn
1
2
n
m
m
= ¸m
1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn
£ ¸i = ¸m
since Pi £ Pi = Pi and ¸m¡1
i :
i
When a matrix is symmetric its left and right-hand eigenvectors are identical
and so the spectral representation take the form:
Theorem 222 If the n £ n matrix A is symmetric then
A = ¸1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn
where the n £ n matrices Pi given by:
xi xT
Pi = ¡ T i ¢
xi xi
and where xi is the eigenvector corresponding to the eigenvalue ¸i of A:: The
matrices Pi are idempotent and orthogonal to each other in that Pi £ Pi = Pi
and Pi Pj = 0 for i 6= j:
Example: For A below the spectral representation is:
A=
·
5 ¡2
¡2
5
¸
=¸1 z
·
z}|{
= 3
=P1
1
2
1
2
}|
1
2
1
2
¸{
=¸2 z
·
z}|{
+ 7
=P2
1
2
¡ 12
}|
¡ 12
Note that P1 and P2 are idempotent; that:
· 1 1 ¸· 1
¸ ·
¸
0 0
¡ 12
2
P1 £ P2 = 12 21
=
1
0 0
¡ 12
2
2
2
and that:
A2
¸2 ·
¸
5 ¡2
29 ¡20
=
¡2
5
¡20
29
· 1 1 ¸
· 1
¸
¡ 12
2
= 32 12 21 + 72
1
¡ 12
2
2
2
=
·
1
2
¸{
:
CHAPTER 3. MATRIX ALGEBRA
179
which you can verify. Also:
A¡1
¸¡1 · 5
5 ¡2
= 21
2
¡2
5
· 1 1 ¸
·21
= 3¡1 12 21 + 7¡1
=
·
2
2
2
21
5
21
1
2
¡ 12
¸
¡ 12
1
2
¸
which you can verify by multiplying A and A¡1 .
3.12
Positive Matrices
3.12.1
The Perron-Frobenius Theorem
In many economic applications it often happens that the elements of a matrix
are all positive in which case we have:
De…nition 223 Positive Matrix: We say a matrix A = [aij ] is positive if
aij > 0 for all i; j:
Example: The matrix:
A=
·
1 2
3 4
¸
is a positive matrix. Note that a positive matrix is not the same as a positive
de…nite matrix.
Positive matrices occur in many economic applications: for example with
input-output matrices, which describe the technological interdependency of the
di¤erent industries in an economy, and Markov chains which describe how probabilities vary over time.
In these applications the eigenvectors and eigenvalues turn out to be quite
important. For example with input-output matrices an eigenvector determines
equilibrium prices or the balanced growth vector and the associated eigenvalue
determines both the rate of pro…t and the growth rate of the economy. We can
then show that all prices will be positive, that the equilibrium price vector is
unique, and that the growth rate of the economy is maximized by appealing to
the Perron-Frobenius theorem:
Theorem 224 The Perron-Frobenius Theorem I If A = [aij ] is an n £ n
positive square matrix then:
^>0.
1. A has a unique positive eigenvalue ¸
^.
2. If ¸i is any other eigenvalue of A then j¸i j < ¸
CHAPTER 3. MATRIX ALGEBRA
180
^ is a positive n £ 1 right-hand eigenvector x
^ = [^
xi ] (i.e.,
3. Associated with ¸
yi ] (i.e., with y^i > 0
^i > 0) and a positive left-hand eigenvector y^ = [^
with x
) which satisfy:
^ x; y^T A = ¸^
^yT :
A^
x = ¸^
These positive eigenvectors are unique up to a scalar multiple.
4. No other eigenvectors exist which have all positive elements.
Remark: Note that we have assumed that for A = [aij ] that aij > 0 and so
we have not allowed any aij = 0. This assumption can be relaxed considerably
as long as A is remains indecomposable, which requires that An have all positive
elements for some n.
Example: Consider the 2 £ 2 matrix A given by:
·
¸
0:3 0:5
A=
0:2 0:7
which has all positive elements. You can verify that the eigenvalues of A are
^ = ¸1 = 0:87417; ¸2 = 0:12583;
¸
^ that the associated right-hand eigenvectors are:
that j¸2 j < ¸;
·
¸
·
¸
0:70753
¡0:94435
x
^ = x1 =
; x2 =
0:81248
0:32895
and the associated left-hand eigenvectors are:
£
¤
£
¤
y^T = y1T = 0:3544 1:0174 ; y2T = ¡0:754 0:65672
^ and y^ have all positive elements, and the other eigenvectors do not
and that x
have all positive elements.
3.12.2
Markov Chains
Suppose that workers can be in 1 of 2 states: either employed, state 1 or unemployed, state 2: Suppose further that the probability of a worker being employed
next year depends only on whether he is employed or unemployed this year.
First suppose the worker is employed this year. Let the probability of employment next year given that he is employed this year be p11 where 0 < p11 < 1
and let the probability of unemployment next year given that he is employed
this year be: p12 = 1 ¡ p11 : Since he will be either employed or unemployed next
year: p12 = 1 ¡ p11 with 0 < p12 < 1:
Now suppose he is unemployed this year. Let the probability of unemployment next year given that he is unemployed this year be p22 where 0 < p22 < 1
and let the probability of employment next year given that he is unemployed
CHAPTER 3. MATRIX ALGEBRA
181
this year be: p21 = 1 ¡ p22 Since he will be either employed or unemployed next
year: p21 = 1 ¡ p22 with 0 < p21 < 1:
We can put all these probabilities in a 2£2 matrix called P called a transition
matrix as:
·
¸
p11 p12
P =
:
p21 p22
Note that the rows of P sum to 1 since probabilities sum to 1 and all the elements
of P are positive so later on we can apply the Perron-Frobenius theorem.
Now suppose you want to know the probability that a worker employed today
will be unemployed in 2 years from now, or for that matter n years from now.
We have the following result:
Theorem 225 Let pij (n) is the probability of being in state j in n periods given
that the worker is in state i today. Then:
·
¸n
¸
·
p11 p12
p11 (n) p12 (n)
n
:
=P =
p21 p22
p21 (n) p22 (n)
Example: Consider the transition matrix
·
¸
0:95 0:05
P =
0:4 0:6
so that someone employed today has a 95% chance of being employed next year,
and someone unemployed today has a 60% chance of being unemployed next
year.
To calculate the corresponding probabilities for two years from now we calculate:
·
¸·
¸
0:95 0:05
0:95 0:05
2
=
P
0:4 0:6
0:4 0:6
·
¸
0:9225 0:0775
=
:
0:62
0:38
Thus someone employed today has a probability of 92% probability of being
employed in two years, and someone unemployed today has a 38% probability
of being unemployed in 2 years.
Now consider n = 10 years in the future. We have using the computer that:
·
¸10
0:95 0:05
P 10 =
0:4 0:6
·
¸ ·
¸
·
¸
0:95 0:05
0:95 0:05
0:95 0:05
=
£
£ ¢¢¢ £
0:4 0:6
0:4 0:6
0:4 0:6
·
¸
0:88917 0:11083
=
0:88664 0:1336
CHAPTER 3. MATRIX ALGEBRA
182
Thus someone employed today has a probability of 89% probability of being
employed in 10 years, and someone unemployed today has a 13% probability
of being unemployed in 10 years, and this is independent of whether you are
employed or unemployed this year!
If we let n get even larger for say n = 50 then this pattern becomes even
more striking. Using the computer we have:
P 50
=
·
¸50
¸
0:889 0:111
0:889 0:111
· ¸
¤
1 £
0:89 0:11 :
=
1
=
·
0:95 0:05
0:4 0:6
The probabilities 0:89 and 0:11 are the long-run probabilities (or equilibrium
probabilities) of being employed and unemployed. Thus the long-run rate of
unemployment for the work force would be 11%:
It turns out that the vector of long-run probabilities:
£
¤
y = 0:89 0:11
is the left-hand eigenvector of P associated with the eigenvalue of 1; that is:
yP = ¸y with ¸ = 1 or:
·
¸
£
¤ 0:95 0:05
£
¤
0:89 0:11
= 1 £ 0:89 0:11 :
0:4 0:6
The is part of a very general result. We have
Theorem 226 If
P =
·
p11
p21
p12
p22
¸
is an transition matrix with positive elements and rows which sum to 1 then:
· ¸
¤
1 £
n
p 1¡p
lim P =
1
n!1
where p and 1 ¡ p are the long-run probabilities of being in state 1 and state 2
and where 0 < p < 1:
Proof. Since the rows of P sum to 1 it follows that if
· ¸
1
¶=
1
CHAPTER 3. MATRIX ALGEBRA
183
then:
·
p11 p12
p21 p22
· ¸
1
=
=¶
1
P¶ =
¸·
1
1
¸
=
·
p11 + p12
p21 + p22
¸
^ = ¶ is the unique positive right-hand eigenvector of P corresponding
so that x
^ = ¸1 = 1: By the Perron-Frobenius theorem we know that
to the eigenvalue ¸
^ = 1. We also know that
the other eigenvalue is less that 1 so that : j¸2 j < ¸
there exists a corresponding left-hand eigenvector
£
¤
y^ = y1 y2
with y^1 > 0 and y^2 > 0: We can normalize y^ so the elements sum to 1 by
1
^ as:
dividing by y1 + y2 and setting p = y^1y^+^
y2 and so we can write y
y^ =
£
p 1¡p
¤
:
Now from the spectral representation for P we have:
P
^ xy^ + ¸2 x2 y2
= ¸^
= ¶^
y + ¸2 x2 y2
and:
P n = ¶y + ¸n2 x2 y2 :
^ = 1 it follows that as n ! 1 that ¸n ! 0 so that:
Since j¸2 j < ¸
2
·
¸
· ¸
¤
p 1¡p
1 £
n
p 1¡p =
:
P ! ¶y =
p 1¡p
1
Remark 1: There is actually no reason to limit ourselves to 2 states. For example a worker might conceivably be in say four states: 1 full-time employment,
2 part-time employment, 3 unemployment and 4 not being in the labour force.
In this case the transition matrix is a 4 £ 4 matrix with positive entries and
rows which sum to 1: For example:
2
3
0:9 0:05 0:04 0:01
6 0:4 0:5 0:06 0:04 7
7
P =6
4 0:2 0:3 0:4 0:1 5
0:05 0:05 0:2 0:7
so that someone unemployed today has a probability p32 = 0:3 of having parttime work next year.
CHAPTER 3. MATRIX ALGEBRA
184
Remark 2: Problems can arise if some of the elements of P are 0: For example
if:
·
¸
0:6 0:4
P =
0
1
then if state 2 is unemployment then an unemployed worker never …nds a job.
In this case unemployment is an absorbing state and employment is a transitory
state; all workers eventually become permanently unemployed.
3.12.3
General Equilibrium and Matrix Algebra
One of the …rst things you learn in economics is the supply and demand model.
This is known as partial equilibrium analysis since it abstracts from the way
di¤erent markets interact with each other. In general equilibrium analysis on
the other hand we explicitly treat the way di¤erent markets interact. General
equilibrium analysis generally requires quite advanced mathematical techniques.
For example it was only with the development in the last 60 years in mathematics
of what are called …xed-point theorems that economists have been able to prove
that a set of prices exists which will equate demand and supply in all markets
in the economy.
Here we will give you a taste of general equilibrium analysis for an economy
with a Leontief technology and where technology determines prices independent
of tastes. Thus consider an economy where there i = 1; 2; : : : n goods that are
produced. Let aij be the amount of good j needed to produce 1 unit of good i:
We can put the aij 0 s into an n £ n matrix A as:
2
3
a11 a12 ¢ ¢ ¢ a1n
6 a21 a22 ¢ ¢ ¢ a2n 7
6
7
A=6 .
..
.. 7 :
..
4 ..
.
.
. 5
an1 an2 ¢ ¢ ¢ ann
The matrix A; referred to as an input-output matrix, captures the Leontief
technology of this economy.
Let pj be the price of good j: The cost of producing one unit of good i is
given by:
ci = ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn
or if we de…ne the n £ 1 vector of costs as c = [ci ] and the n £ 1 vector of prices:
p = [pj ] then in matrix form:
c = Ap:
The revenue from producing 1 unit of good i is just the price: pi so that
pro…ts are given by:
pi ¡ (ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn )
CHAPTER 3. MATRIX ALGEBRA
185
and the rate of pro…t in industry i is given by pro…ts divided by the costs so
that:
¼i =
pi ¡ (ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn )
:
ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn
Now in equilibrium the rate of pro…t must be the same in each industry, otherwise no production would take place in those industries with a lower rate of
pro…t. Thus we require that ¼i = ¼ for all i so that:
¼=
pi ¡ (ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn )
ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn
or:
pi = (ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn ) (1 + ¼)
or in matrix notation:
p = Ap (1 + ¼)
or written slightly di¤erently:
Ap =
1
p:
1+¼
Note this takes the same form as Ax = ¸x where x is an eigenvector and ¸ is
an eigenvalue. From this it follows that x = p is an eigenvector of the matrix
1
A and that ¸ = 1+¼
is an eigenvalue of A:
In general there will be n eigenvalues and eigenvectors so the question is
which one is the appropriate one. Since p is a vector of prices and prices are all
positive, we cannot accept eigenvectors with negative elements.
From the Perron-Frobenius theorem we know that there is only one eigen^ with all positive elements and this corresponds to the eigenvalue
vector p = x
^ > 0 which determines the rate of pro…t of the economy. We therefore have:
¸
Theorem 227 There exists a unique (up to a scalar multiple) positive price
vector which is the positive right-hand eigenvector of A associated with the eigen^
value ¸:
Theorem 228 The equilibrium rate of pro…t is given by:
¼=
1
¡ 1:
^
¸
Remark: If p = x
^ is an equilibrium vector of prices then so too is ®^
p where
® is any positive scalar. This non-uniqueness is a general feature of general
equilibrium models and corresponds to the fact that agents only care about
relative prices. Thus if ® = 2 and we double all prices in the economy this
will have no a¤ect on rational economic decision making or equilibrium in the
CHAPTER 3. MATRIX ALGEBRA
186
economy. Thus while p and 2p correspond to di¤erent nominal price vectors,
real or relative prices are the same.
^ is a right-hand eigenvector. It turns out that the leftNote that p = x
hand eigenvector y^ also has an interesting economic interpretation. Thus by
^ there exists a
the Perron-Frobenius theorem we know that corresponding to ¸
T
^
y: It turns out that:
unique positive eigenvector y^ which satis…es y^ A = ¸^
^
Theorem 229 y^ determines the balanced growth path for the economy and ¸
the growth rate of the economy.
Proof. Let yi be the amount of good i produced. Then the input requirement of good j will be:
rj = y1 a1j + y2 a2j + ¢ ¢ ¢ + yn anj
or if we de…ne the 1 £ n vector of production levels as y = [yi ] and the 1 £ n
vector of input requirements as r = [rj ] then in matrix notation:
r = y^A:
If there is balanced growth so that there is no unemployment or shortages in
the economy, then:
y = (1 + ½) r
where 1 + ½ is the growth rate of the economy. In matrix notation we then have:
yA =
1
y:
1+½
Since we require that all elements of y be positive it follows that: y = y^ and
^= 1 :
¸
1+½
^ in addition to determining the pro…t rate also determines the growth
Thus ¸
rate of the economy along the balanced growth path. Therefore:
Theorem 230 With balanced growth the rate of growth of the economy and the
rate of pro…t are identical and are given by:
¼=½=
1
¡ 1:
^
¸
Example: Suppose the economy has n = 2 sectors and
·
¸
0:3 0:65
A=
:
0:2 0:72
^ = 0:92725 and ¸2 = 0:0927: It follows
The two eigenvalues of A are then: ¸1 = ¸
that the rate of pro…t ¼ and the growth rate of the economy ½ are identical:
^ = 0:92725 =
¸
1
1
=
1+¼
1+½
CHAPTER 3. MATRIX ALGEBRA
187
so that:
¼=½=
1
¡ 1 = 0:0785
0:92725
and the pro…t and growth rates are both 7:85%:
^ determines prices and
The positive right-hand eigenvector associated with ¸
is:
·
¸
0:81754
p=x
^=
:
0:78893
Thus the relative price of good 1 and 2 will be
p1
0:81754
= 1:036
=
p2
0:78893
so if p2 = 1 (the second good is the numeraire) then the real price of good 1 is
1:036 units of good 2:
The positive left-hand eigenvector of A determines the balanced growth path
and is:
·
¸
0:34513
y^ =
1:0824
and so:
0:34513
y^1
= 0:319
=
y^2
1:0824
so that if no resources are to be unemployed and there are to be no shortages,
then along the balanced growth path 0:319 units of factor 1 will be employed
for every unit of factor 2 employed.
Chapter 4
Multivariate Calculus
4.1
Functions of Many Variables
To treat variables as constants is the characteristic vice of the unmathematical economist. -Francis Edgeworth
Functions of only one variable: y = f (x) can only take us so far. Usually
when we write such functions we have in mind that there are other variables in
the background that are kept constant. For example although we might write
a demand function as:
Q = Q (P )
we know this is wrong, that the quantity demanded depends not only on the
own price: P , but also on the price of other goods P1 ; P2 ; : : : Pn (substitutes and
complements) and on income Y: We should instead write a demand function as:
Q = Q (P; P1 ; P2 ; : : : Pn ; Y ) :
The same argument would apply equally well to almost anything else we consider in economics for the simple reason that economic variables are generally
in‡uenced by many other variables and not just one.
Thus we now change our focus to functions of the form:
y = f (x1 ; x2 ; : : : xn )
and the calculus tools we need to work with these functions.
Multivariate functions are generally hard to visualize since in order to graph
them we need n + 1 dimensions: n dimensions for the xi 0 s and one dimension
for the y: A function with n = 2 independent variables: y = f (x1 ; x2 ) requires a
three dimensional graph, something which can be represented (with di¢culty)
188
CHAPTER 4. MULTIVARIATE CALCULUS
189
on a two-dimensional page. For functions with n ¸ 3 however we really cannot
directly visualize a function. Following what we have learned in linear algebra,
we can nevertheless know a lot about these functions analytically. For example
we will be able to tell which functions are mountains, which are valleys and
where the tops and bottoms of these valleys are.
It is often tedious to explicitly write out all n of the xi 0 s: Instead we can
put all of them in a n £ 1 row vector x as and write y = f (x1 ; x2 ; : : : xn )
more compactly as: y = f (x) : Note that this looks exactly like a function in
univariate calculus but where x is now interpreted as an n £ 1 vector.
Example: Consider a multivariate function with n = 2 as:
2
2
y = f (x) = f (x1 ; x2 ) = e¡ 2 (x1 +x2 ¡x1 x2 ) :
1
where x = [x1 ; x2 ]T is a 2£ 1 vector. This is a three-dimensional mountain as
depicted below:
y
x2
x1
:
¡ 12 (x21 +x22 ¡x1 x2 )
f (x1 ; x2 ) = e
where the vertical axis is y and the two dimensional plane has x1 on one axis
and x2 on the other.
If x1 = 2 and x2 = 1 we have:
2
y = f (2; 1) = e¡ 2 (2
1
+12 ¡2£1)
= 0:22313
while if x1 = ¡ 12 and x2 = 23 then:
µ
¶
³
´
2
2
1 2
¡ 12 (¡ 12 ) +( 23 ) ¡(¡ 12 )£( 23 )
= 0:598:
y=f ¡ ;
=e
2 3
4.2
Partial Derivatives
Mathematics is a language. -Josiah Willard Gibbs
CHAPTER 4. MULTIVARIATE CALCULUS
190
The cornerstone of multivariate calculus is the partial derivative. Given a
function of n variables: y = f (x1 ; x2 ; : : : xn ) there will be n partial derivatives,
one for each one of the xi 0 s: Calculating partial derivatives is really no more
di¢cult than calculating an ordinary derivative in univariate calculus. We have:
De…nition 231 Partial Derivative: Given the function y = f (x1 ; x2 ; : : : xn )
the partial derivative with respect to xi ; denoted by:
@y
@f (x1 ; x2 ; : : : xn )
´
@xi
@xi
is the ordinary derivative of f (x1 ; x2 ; : : : xn ) with respect to xi obtained by treating all other xj 0 s (for j 6= i ) as constants.
Remark 1: For ordinary derivatives we use d as in
@y
:
we use the old German letter d: ‘@’ as in @x
i
dy
dx
while partial derivatives
Remark 2: Another useful notation for a partial derivative is to write either
xi or i as a subscript:
@y
´ fi (x1 ; x2 ; : : : xn ) ´ fxi (x1 ; x2 ; : : : xn ) :
@xi
@y
; this notation emphasizes that the fact that like f (x1 ; x2 ; : : : xn ) the
Unlike @x
i
partial derivative is also a multivariate function of x1 ; x2 ; : : : xn .
Remark 3: A very bad notation often used by students is to write a partial
derivative as:
f 0 (x1 ; x2 ; : : : xn ) :
The problem here is that a partial derivative is always with respect to a particular xi but this notation does not tell you which xi you are di¤erentiating
with respect to. Thus if you write: f 0 (x1 ; x2 ) there is no way of knowing if you
1 ;x2 )
1 ;x2 )
: Therefore: Do not use the notation: f 0 () for
mean @f (x
or @f (x
@x1
@x2
partial derivatives.
Example 1: To calculate the partial derivative of
y = f (x1 ; x2 ) = x51 x72
with respect to x1 we treat x2 as a constant and di¤erentiate with respect to
x1 to obtain:
@f (x1 ; x2 )
@x1
@y
@ ¡ 5 7¢
x x
´
@x1
@x1 1 2
= 5x41 x72 :
´
CHAPTER 4. MULTIVARIATE CALCULUS
191
Note that 5x41 x72 is a function of both x1 and x2 ; that is although in the
1 ;x2 )
x2
calculation we treated x2 as a constant, after we have calculated @f (x
@x1
reverts to its former status as a variable, just like x1 : That is why we write
@f (x1 ;x2 )
(x1 )
:
and not @f@x
@x1
1
To calculate the derivative of f (x1 ; x2 ) with respect to x2 treat x1 as a
constant and di¤erentiate with respect to x2 to obtain:
@f (x1 ; x2 )
@x2
@y
@ ¡ 5 7¢
x x
´
@x2
@x1 1 2
= 7x51 x62 :
´
Example 2: Given:
y = f (x1 ; x2 ) =
1
2
ln (x1 ) + ln (x2 ) ¡ x1 x2
3
3
1 ;x2 )
1 ;x2 )
1 ;x2 )
: To calculate @f (x
there will be two partial derivatives: @f (x
and @f (x
@x1
@x2
@x1
we treat x2 as a constant and di¤erentiate with respect to x1 as:
µ
¶
@
2
1
@f (x1 ; x2 )
ln (x1 ) + ln (x2 ) ¡ x1 x2
=
@x1
@x1 3
3
1 @
2
@
@
=
¡ x2
ln (x1 ) +
ln (x2 )
x1
3 @x1
3
@x
@x1
| {z
}
| {z }
| 1 {z }
= x1
1
@f(x1 ;x2 )
@x2
=1
1 1
¡ x2 :
3 x1
=
To calculate
to x2 as:
=0 since x2 is a constant
we treat x1 as a constant and di¤erentiate with respect
@f (x1 ; x2 )
@x2
=
=
µ
¶
@
2
1
ln (x1 ) + ln (x2 ) ¡ x1 x2
@x2 3
3
2 1
¡ x1 :
3 x2
Example 3: Given:
f (x1 ; x2 ; x3 ) = x23 x31 + 2 ln (x2 ) x21
CHAPTER 4. MULTIVARIATE CALCULUS
192
we have:
@f (x1 ; x2 ; x3 )
@x1
=
=
@f (x1 ; x2 ; x3 )
@x2
=
=
@f (x1 ; x2 ; x3 )
@x3
=
=
4.2.1
¢
@ ¡ 2 3
x3 x1 + 2 ln (x2 ) x21
@x1
3x23 x21 + 4 ln (x2 ) x1 ;
¢
@ ¡ 2 3
x x + 2 ln (x2 ) x21
@x2 3 1
2x21
;
x2
¢
@ ¡ 2 3
x3 x1 + 2 ln (x2 ) x21
@x3
2x31 x3 :
The Gradient
It is often tedious to write down all of the n partial derivatives of f (x1 ; x2 ; : : : xn ) :
Just as we can write f (x1 ; x2 ; : : : xn ) as f (x) by letting x be an n£ 1 vector,
we can use matrix algebra to obtain a more compact notion by putting each of
the n partial derivatives into a n £ 1 vector, called the gradient. We have:
De…nition 232 Gradient: Given the function y = f (x) where x is an n £ 1
vector, the gradient is an n £ 1 vector of partial derivatives denoted by: rf (x)
(x)
:
or @f@x
2
6
6
@f (x)
´ rf (x) ´ 6
6
@x
4
@f (x1 ;x2 ;:::xn )
@x1
@f (x1 ;x2 ;:::xn )
@x2
..
.
@f (x1 ;x2 ;:::xn )
@xn
3
7
7
7:
7
5
Example 1: Given:
f (x1 ; x2 ) =
1
2
ln (x1 ) + ln (x2 ) ¡ x1 x2
3
3
the gradient is a 2 £ 1 vector given by:
rf (x1 ; x2 ) =
"
@f (x1 ;x2 )
@x1
@f (x1 ;x2 )
@x2
#
2
=4
1
3x1
¡ x2
2
3x2
¡ x1
Example 2: Given:
f (x1 ; x2 ; x3 ) = x23 x31 + 2 ln (x2 ) x21
3
5:
CHAPTER 4. MULTIVARIATE CALCULUS
193
the gradient is a 3 £ 1 vector given by:
2
3x23 x21 + 4 ln (x2 ) x1
2x21
x2
rf (x1 ; x2 ; x3 ) = 4
2x3 x31
3
5:
Imagine you are standing on a three-dimensional mountain y = f (x1 ; x2 ).
Looking at the slope, you turn around until you are looking in the direction
where the mountain is steepest or in the direction where the climbing would
be the hardest. You are then looking in the same direction as the gradient
rf (x1 ; x2 ). In general we have:
Theorem 233 The gradient rf (x) points in the direction where the function
f (x) is steepest.
Example: In Example 1 above we calculated the gradient for the function:
1
2
1
2
3 ln (x1 ) + 3 ln (x2 ) ¡ x1 x2 : For x1 = 3 ; x2 = 3 the gradient is:
2
rf (1; 1) = 4
=
·
1
3x1
¡ x2
2
3x2
¡ x1
1
3
2
3
¸
3
5
jx
1
2
1 = 3 ;x2 = 3
and so the function is steepest in the direction that the vector depicted below
points:
0.6
0.5
0.4
x2
0.3
0.2
0.1
0
0.05 0.1 0.15 0.2 0.25 0.3
x1
CHAPTER 4. MULTIVARIATE CALCULUS
4.2.2
194
Interpreting Partial Derivatives
A partial derivative is much like the result of a controlled experiment. Suppose for example you want to know how vitamin C a¤ects the life expectancy
of rats. In a proper experiment you would try and hold all variables constant
except vitamin C, vary the consumption of vitamin C and observe what happens to the rats’ life expectancy. If you see the rats with more vitamin C live
longer (shorter), you then can conclude that there exists a positive (negative)
relationship between vitamin C and the life expectancy of rats.
Now instead of real rats suppose that we have a multivariate function y =
f (x1 ; x2 ; : : : xn ) where y is life expectancy and xi is vitamin C consumption.
Just as with real rats we want to know how xi a¤ects y. Instead of an experiment
;x2 ;:::xn )
we calculate the partial derivative @f(x1@x
. Just as with the experiment we
i
hold all other variables constant except vitamin C when calculating a partial
derivative. The sign of this partial derivative then tells us the nature of the
relationship between xi and y: In particular:
Theorem 234 Given y = f (x1 ; x2 ; : : : xn ) if:
@f (x1 ; x2 ; : : : xn )
>0
@xi
then y is an increasing function of xi ; that is increasing (decreasing) xi holding
all other xj 0 s …xed will increase (decrease) y:
Theorem 235 Given y = f (x1 ; x2 ; : : : xn ) if:
@f (x1 ; x2 ; : : : xn )
<0
@xi
then y is a decreasing function of xi ; that is increasing (decreasing) xi holding
all other xi 0 s …xed will decrease (increase) y:
Remark: As with univariate calculus, these properties can hold either locally
> 0 for all x in the domain we would say that y is a
or globally. If @f@x(x)
i
globally increasing function of xi : If
a locally increasing function of xi :
@f (x)
@xi
> 0 only at a point then y would be
The partial derivative also gives us quantitative information about the relationship between xi and y; in particular it gives us the xi multiplier.
Theorem 236 Given y = f (x1 ; x2 ; : : : xn ) if xi is changed by ¢xi with all
other xj 0 s kept constant then the change in y is approximately given by:
¢y ¼
@f (x1 ; x2 ; : : : xn )
¢xi
@xi
where the approximation gets better the smaller is ¢xi :
CHAPTER 4. MULTIVARIATE CALCULUS
195
Example 1: Consider:
y = f (x1 ; x2 ) =
1
2
ln (x1 ) + ln (x2 ) ¡ x1 x2
3
3
where:
1
@f (x1 ; x2 )
¡ x2 :
=
@x1
3x1
At x1 =
1
12
and x2 =
1
2
we have
¡ 1 1¢
@f 12
;2
1
7
1
= ¡1¢ ¡ = >0
@x1
2
2
3 12
¡ 1 1¢
; 2 or y is a
so that a positive relationship exists between x1 and y at 12
locally increasing function of x1 :
1
1
For example if we increase x1 a small amount from x1 = 12
to say x1 = 10
so that
1
1
¡
= 0:017;
¢x1 =
10 12
¡ 1 1¢
;2 =
at 12 , then y will increase from y = f 12
and if we keep x2¡ constant
¢
1 1
¡1:332 1 to y = f 10
; 2 = ¡1:2796 or by:
¢y
= ¡1:2796 ¡ (¡1:3321)
= 0:0525
¡ 1 1¢
@f 12
;2
¼
¢x1
@x1
7
£ 0:017
=
2
= 0:0595:
Now focusing on the relationship between x2 and y we have:
@f (x1 ; x2 )
2
=
¡ x1
@x2
3x2
so that:
@f
¡
1 1
12 ; 2
@x2
¢
=
1
5
2 1
¡ ¢¡
= >0
3 12
12
4
so that y is a locally increasing function of x2 : Thus if we increase x2 a small
1
, then y will increase.
amount from x1 = 12 , keeping x1 constant at 12
1
On the other hand at x1 = 2 when x2 = 2 we have:
¡
¢
@f 2; 12
1
1
1
¡ =¡ <0
=
@x1
3 (2) 2
3
¡ 1¢
@f 2; 2
2
2
¡1¢ ¡ 2 = ¡ < 0
=
@x2
3
3 2
CHAPTER 4. MULTIVARIATE CALCULUS
196
and so¡ it follows
that f (x1 ; x2 ) is a locally decreasing function of both x1 and
¢
x2 at 2; 12 :
Example 2: Consider the function:
y = f (x1 ; x2 ) = e2x1 ¡3x2 :
We have:
@f (x1 ; x2 )
= 2e2x1 ¡3x2 > 0
@x1
for all x1 ; x2 and so y is a globally increasing function of x1 : Similarly:
@f (x1 ; x2 )
= ¡3e2x1 ¡3x2 < 0
@x2
for all x1 ; x2 and so y is a globally decreasing function of x2 :
4.2.3
The Economic Language of Partial Derivatives
Consider a demand curve:
Q = Q (P; P1 ; P2 ; : : : Pn ; Y )
where P is the own price, the price of other related goods are P1 ; P2 ; : : : Pn
income is Y . Now suppose we want to say that the demand curve is downward
sloping. What does that mean? In introductory economics we would say that a
demand curve is downward sloping if increasing P while holding P1 ; P2 ; : : : Pn
and Y results in Q going down.
Notice how long it takes to say this! We can be much more concise if we use
mathematics and say simply, a demand curve slopes downward if:
@Q
< 0:
@P
One of the reasons this is more concise is that instead of saying “all other
variables are held constant” we merely write @ instead of d to express this idea.
@Q
> 0:
Similarly if we want to say that the good is normal, we simply write: @Y
This then replaces the introductory de…nition which states that a good is normal
if increasing Y holding P; P1 ; P2 ; : : : Pn constant causes Q to increase. If we want
@Q
< 0; if we want to say that good i is
to say that a good is inferior we write: @Y
@Q
a substitute we write: @Pi > 0; if we want to say that good j is a complement
@Q
< 0.
we write: @P
j
These ideas apply equally well to supply curves, cost functions or just about
anything else one considers in economics. Thus much of the informal language
one uses in introductory economics is reformulated in terms of partial derivatives in more advanced economics, and this allows us to state ideas much more
concisely.
CHAPTER 4. MULTIVARIATE CALCULUS
197
Example: Consider a demand curve for co¤ee:
Q = P ¡2 P13 P2¡3 Y 2
where P is the price of co¤ee, P1 is the price of tea, P2 is the price of sugar and
Y is income. Then:
@Q
@P
@Q
@P1
@Q
@P2
@Q
@Y
4.2.4
= ¡2P ¡3 P13 P2¡3 Y 2 < 0 =) the co¤ee demand curve slopes downward
= 3P ¡2 P12 P2¡3 Y 2 > 0 =) co¤ee and tea are substitutes
= ¡3P ¡2 P13 P2¡4 Y 2 < 0 =) co¤ee and sugar are complements
= 2P ¡2 P13 P2¡3 Y 1 < 0 =) co¤ee is a normal good.
The Use of the Word Marginal
In introductory economics the marginal product of labour is de…ned as the
contribution of the last worker hired to output. In univariate calculus we de…ned
the marginal product of labour as the derivative of the short-run production
function Q = f (L) with respect to L: The short-run production function is
obtained from the production function Q = F (L; K) by holding K constant.
Since K was held …xed this ordinary derivative was actually a partial derivative.
Thus the precise de…nition of the marginal product of labour is as the partial
derivative:
MPL (L; K) ´
@F (L; K)
:
@L
In general then when economists use the word ‘marginal’ as in marginal
utility, marginal product of labour or the marginal product of capital they are
referring to a partial derivative where all other variables are held constant. Thus:
De…nition 237 When in economics we refer to a ‘marginal’ concept we mean
a partial derivative.
Example 1: Consider a production function:
Q = F (L; K)
where Q is output, L is labour and K is capital. The marginal product of
labour is the partial derivative:
M PL (L; K) ´
@F (L; K)
@L
CHAPTER 4. MULTIVARIATE CALCULUS
198
while the marginal product of capital is:
MPK (L; K) ´
@F (L; K)
:
@K
Thus for the Cobb-Douglas production function:
1
1
Q = L2 K 4
the marginal products of labour and capital are given by:
M PL (L; K) =
M PK (L; K) =
1 ¡1 1
L 2 K 4 > 0;
2
1 1 ¡3
L 2 K 4 > 0:
4
The fact that the marginal products are positive means that Q is a globally
increasing function of L and a globally increasing function of K; that is labour
and capital are productive.
Example 2: Consider a household which gets utility from two goods Q1 and
Q2 as:
U = U (Q1 ; Q2 ) :
The marginal utility of good 1 is:
MU1 (Q1 ; Q2 ) ´
@U (Q1 ; Q2 )
@Q1
while the marginal utility of good 2 is:
M U1 (Q1 ; Q2 ) ´
@U (Q1 ; Q2 )
:
@Q2
For the Cobb-Douglas utility function:
1
2
U (Q1 ; Q2 ) = Q13 Q23
the marginal utilities of Q1 and Q2 are given by:
M U1 (Q1 ; Q2 ) =
M U2 (Q1 ; Q2 ) =
1 ¡ 23 23
Q Q2 > 0;
3 1
2 13 ¡ 13
Q Q
> 0:
3 1 2
The fact that the marginal utilities are positive means that utility is a globally
increasing function of both Q1 and Q2 , in other words both Q1 and Q2 are
‘goods’ and not ‘bads’.
CHAPTER 4. MULTIVARIATE CALCULUS
4.2.5
199
Elasticities
Instead partial derivatives we often prefer to talk about elasticities since these
are free of units of measurement. Again elasticities are de…ned under the assumption that all other variables but one are held …xed. Thus for multivariate
functions we de…ne an elasticity as:
De…nition 238 Elasticity: Given y = f (x1 ; x2 ; : : : xn ) the elasticity with respect to xi is:
´i (x1 ; x2 ; : : : xn ) ´
´
@y xi
@xi y
@f (x1 ; x2 ; : : : xn )
xi
:
@xi
f (x1 ; x2 ; : : : xn )
In general elasticities change as x1 ; x2 ; : : : xn change. The functional form
which has the property that the elasticities do not depend on x1 ; x2 ; : : : xn is
the multivariate generalization of y = Axb given below:
Theorem 239 If:
y = f (x1 ; x2 ; : : : xn ) = Axb11 xb22 £ ¢ ¢ ¢ £ xbnn
then all elasticities are independent of x1 ; x2 ; : : : xn and:
´ i = bi :
Example: Consider again the demand curve for co¤ee:
Qd = P ¡2 P13 P2¡3 Y 2
Note that the demand function has the functional form Axb11 xb22 £ ¢ ¢ ¢ and so
the elasticities are simply the exponents on each variable. We therefore have:
´P
=
@Qd P
¡2P ¡3 P13 P2¡3 Y 2 £ P
=
= ¡2
d
@P Q
P ¡2 P13 P2¡3 Y 2
´P1
=
@Qd P1
3P ¡2 P12 P2¡3 Y 2 £ P1
=
=3
@P1 Qd
P ¡2 P13 P2¡3 Y 2
´P2
=
@Qd P2
¡3P ¡2 P13 P2¡4 Y 2 £ P2
=
= ¡3
@P2 Qd
P ¡2 P13 P2¡3 Y 2
´Y
=
@Qd Y
2P ¡2 P13 P2¡3 Y 1 £ Y
=
= 2:
@Y Qd
P ¡2 P13 P2¡3 Y 2
Thus a 1% increase in P leads to a 2% fall Q (demand is elastic), a 1%
increase in P1 leads to a 3% increase in Q (co¤ee and tea are substitutes), a
1% increase in P2 leads to a 3% decrease Q (co¤ee and sugar are complements),
and a 1% increase in Y leads to a 2% increase in Q (co¤ee is a normal good).
CHAPTER 4. MULTIVARIATE CALCULUS
4.2.6
200
The Chain Rule
We need the chain rule whenever we are working with functions of functions.
Consider then the following situation. We have an outside function:
y = f (x1 ; x2 ; : : : xn )
and n inside functions: g1 (w) ; g2 (w) ; : : : gn (w) where w is a scalar. We replace
each xi with the inside function gi (w) to obtain:
h (w) = f (g1 (w) ; g2 (w) ; : : : gn (w)) :
The multivariate chain rule then tells us how to calculate h0 (w) :
Theorem 240 Multivariate Chain Rule: If
h (w) = f (g1 (w) ; g2 (w) ; : : : gn (w))
then:
h0 (w) =
@f (g1 (w) ; g2 (w) ; : : : gn (w)) 0
g1 (w)
@x1
@f (g1 (w) ; g2 (w) ; : : : gn (w)) 0
g2 (w)
+
@x2
@f (g1 (w) ; g2 (w) ; : : : gn (w)) 0
gn (w) :
+¢¢¢ +
@xn
Remark 1: Think of a multi-national oil company that has n subsidiaries
in n di¤erent countries. Suppose that w is the price of oil, that the inside
function: gi (w) is the before-tax pro…ts in country i in the local currency and
the outside function f (x1 ; x2 ; : : : xn ) converts the pro…ts in each country’s local
currency into say US dollars. Thus h (w) gives total pro…ts in US dollars as a
function of the price of oil and h0 (w) indicates how pro…ts change as the price
of oil changes. The terms in the chain rule take the form:
@f (g1 (w) ; g2 (w) ; : : : gn (w)) 0
gi (w) :
@xi
Here gi0 (w) tells you how pro…ts in country i change as the price of oil changes
while the multiplier:
@f (g1 (w) ; g2 (w) ; : : : gn (w))
@xi
indicates how a change in local currency pro…ts a¤ect aggregate U S dollar profits. The total e¤ect of a change in the price of oil h0 (w) is then the sum of these
e¤ects of the n subsidiaries, as indicated by the chain rule.
CHAPTER 4. MULTIVARIATE CALCULUS
201
Remark 2: Although the multivariate chain rule might look complicated, it is
exactly the same idea as in univariate calculus where one starts by taking the
derivative of the outside function, replacing the x with the inside function and
then multiplying by the derivative of the inside function. The only di¤erence is
0
that now there are now n xi 0 s and n inside functions: the gi (w) s . One thus
has to take the partial derivative of the outside function with respect to each
xi , one must replace each xi with the n inside functions, multiply this by gi0 (w)
and add up all n terms. A recipe for this goes as follows:
A Recipe for the Multivariate Chain Rule
Starting with x1 :
1. Take the partial derivative of the outside function f (x1 ; x2 ; : : : xn ) with
;x2 ;:::xn )
:
respect to xi : @f (x1@x
i
@f (x1 ;x2 ;:::xn )
with
@xi
@f (g1 (w);g2 (w);:::gn (w))
@xi
2. Replace every occurrence of xi in
inside function gi (w) to obtain:
3. Multiply the result in 2) by gi0 (w) : to obtain:
the corresponding
@f (g1 (w);g2 (w);:::gn (w))
@xi
gi0 (w)
4. Repeat steps 1 to 3 for all xi :
5. Add up the results of 1 to 4 together to obtain h0 (w).
Example 1: Consider a function with two x0 s :
f (x1 ; x2 ) =
1
2
ln (x1 ) + ln (x2 ) ¡ x1 ¡ x2
3
3
and let g1 (w) = w2 and g2 (w) = ew be the two inside functions. If we replace
every occurrence of x1 with w2 and every occurrence of x2 with ew we obtain:
h (w) = f (g1 (w) ; g2 (w))
¢
¡
= f w2 ; ew
1 ¡ 2¢ 2
=
ln w + ln (ew ) ¡ w2 ¡ ew
3
3
2
2
ln (w) + w ¡ w2 ¡ ew :
=
3
3
We can then calculate h0 (w) directly as:
h0 (w) =
2
2
+ ¡ 2w ¡ ew :
3w 3
Now let us now use the recipe for the multivariate chain rule to calculate h0 (w) :
Following the recipe we have:
CHAPTER 4. MULTIVARIATE CALCULUS
202
1. The partial derivative of the outside function with respect to x1 is:
1
@f (x1 ; x2 )
¡ 1:
=
@x1
3x1
2. Replacing x1 with w2 and x2 with ew in 1 results in:
¡
¢
@f w2 ; ew
1
¡ 1:
=
@x1
3w2
3. Since g1 (w) = w2 we have: g10 (w) = 2w and so multiplying 2 with g10 (w)
yields:
¡
¢
¶
µ
@f w2 ; ew 0
1
¡ 1 £ 2w:
g1 (w) =
@x1
3w2
4. Repeat steps 1 to 3 with respect to x2 : Thus with x2 we have:
2
@f (x1 ; x2 )
¡1
=
@x2
3x2
and so replacing x1 and x2 by w2 and ew :
¢
¡
@f w2 ; ew
2
= w ¡1
@x2
3e
and multiplying by g20 (w) = ew we get:
µ
¶
2
¡ 1 £ ew :
3ew
5. Adding up the results from 1 to 4 yields:
µ
¶
µ
¶
1
2
0
¡ 1 £ 2w +
¡ 1 £ ew
h (w) =
3w2
3ew
2
2
+ ¡ 2w ¡ ew
=
3w 3
and so we get the same answer as the direct calculation.
Example 2: Proving the Product Rule Using the Chain Rule. Suppose
the outside function is
p (x1 ; x2 ) = x1 x2
so that p (x1 ; x2 ) simply multiplies x1 and x2 together. The two inside functions
are any two univariate functions: f (x) and g (x) where x here is a scalar so that:
h (x) = p (f (x) ; g (x)) = f (x) g (x)
CHAPTER 4. MULTIVARIATE CALCULUS
203
and so h (x) is just the product of two functions f (x) and g (x) :
To …nd h0 (x) we use the multivariate chain rule. We have:
@p (x1 ; x2 )
@p (x1 ; x2 )
= x2 and
= x1
@x1
@x2
so that:
@p (f (x) ; g (x)) 0
@p (f (x) ; g (x)) 0
f (x) +
g (x)
@x1
@x2
= g (x) f 0 (x) + f (x) g 0 (x)
h0 (x) =
which is the product rule in univariate calculus.
4.2.7
A More General Multivariate Chain Rule
The chain rule can be generalized to the case where the inside functions are
multivariate functions. Although this is more general, if you understand the
previous chain rule there are really no new ideas involved except that some
things which were ordinary derivatives become partial derivatives.
Theorem 241 Multivariate Chain Rule: If w in gi (w) is an m £ 1 vector
as: gi (w) = gi (w1 ; w2 ; : : : wm ) and if
h (w1 ; w2 ; : : : wm ) = f (g1 (w) ; g2 (w) ; : : : gn (w))
then:
@h (w1 ; w2 ; : : : wm )
@wj
4.2.8
=
@f (g1 (w) ; g2 (w) ; : : : gn (w)) @g1 (w1 ; w2 ; : : : wm )
@x1
@wj
@f (g1 (w) ; g2 (w) ; : : : gn (w)) @g2 (w1 ; w2 ; : : : wm )
+
@x2
@wj
@f (g1 (w) ; g2 (w) ; : : : gn (w)) @gn (w1 ; w2 ; : : : wm )
+¢¢¢ +
:
@xn
@wj
Homogeneous Functions
In agriculture, the state of the art being given, doubling the labour
does not double the produce. -John Stuart Mill
In economics one encounters many functions which are homogeneous. Demand and supply curves are always homogenous of degree 0; the marginal utility
of income is always homogenous of degree ¡1 and cost functions are always homogeneous of degree 1: The homogeneity of a production function determines
whether there are decreasing (small is beautiful), constant or increasing returns
to scale (bigger is better).
Homogeneity is de…ned as follows:
CHAPTER 4. MULTIVARIATE CALCULUS
204
De…nition 242 A function f (x1 ; x2 ; : : : xn ) is said to be homogeneous of degree
k if and only if for any ¸ > 0 :
f (¸x1 ; ¸x2 ; : : : ¸xn ) = ¸k f (x1 ; x2 ; : : : xn ) :
Remark: To prove that a given function is homogeneous of some degree, one
begins with:
f (¸x1 ; ¸x2 ; : : : ¸xn )
and through a series of derivations one tries to obtain:
¸k f (x1 ; x2 ; : : : xn ) :
The exponent on ¸ then gives the degree of the homogeneity.
Example 1: The Cobb-Douglas production function
Q = F (L; K) = AL® K ¯
is homogeneous of degree k = ® + ¯; the sum of the exponents on capital and
labour, since:
A (¸L)® (¸K)¯
A¸® L® ¸¯ K ¯
¸®+¯ AL® K ¯
¸®+¯ F (L; K) :
F (¸L; ¸K) =
=
=
=
1
1
Thus Q = L 2 K 4 is homogeneous of degree k =
1
2
+
1
4
= 34 :
Example 2: The Constant Elasticity of Substitution or CES production function:
°
Q = F (L; K) = (®L½ + (1 ¡ ®) K ½ ) ½
is homogeneous of degree ° since:
°
F (¸L; ¸K) = (® (¸L)½ + (1 ¡ ®) (¸K)½ ) ½
°
= (¸½ (®L½ + (1 ¡ ®) K ½ )) ½
°
°
= (¸½ ) ½ (®L½ + (1 ¡ ®) K ½ ) ½
°
= ¸° (®L½ + (1 ¡ ®) K ½ ) ½
= ¸° F (L; K) :
For example if ® = 12 ; ½ = ¡1 and ° = 1 then the CES production function:
¢¡1
¡
Q = 12 L¡1 + 12 K ¡1
is homogenous of degree 1:
An important calculus property of homogeneous functions is Euler’s theorem:
CHAPTER 4. MULTIVARIATE CALCULUS
205
Theorem 243 Euler’s Theorem. If f (x1 ; x2 ; : : : xn ) is homogeneous of degree k then:
kf (x1 ; x2 ; : : : xn ) =
@f (x1 ; x2 ; : : : xn )
@f (x1 ; x2 ; : : : xn )
@f (x1 ; x2 ; : : : xn )
x1 +
x2 + ¢ ¢ ¢ +
xn :
@x1
@x2
@xn
Proof. Let:
h (¸) = f (¸x1 ; ¸x2 ; : : : ¸xn ) = ¸k f (x1 ; x2 ; : : : xn ) :
Using the multivariate chain rule on f (¸x) and the fact that
…nd that:
d¸k
d¸
= k¸k¡1 we
@f (¸x1 ; ¸x2 ; : : : ¸xn )
@f (¸x1 ; ¸x2 ; : : : ¸xn )
x1 + ¢ ¢ ¢ +
xn
@x1
@xn
= k¸k¡1 f (x1 ; x2 ; : : : xn ) :
h0 (¸) =
Now set ¸ = 1 and the result follows.
Example 1: We have seen that:
1
1
F (L; K) = L 2 K 4
is homogeneous of degree k =
=
=
=
=
=
1
2
+
1
4
= 34 : To verify Euler’s theorem note that:
@F (L; K)
@F (L; K)
£L+
£K
@L
µ
¶
µ@K
¶
@ 1 1
@ 12 14
L2 K 4 £ L +
L K
£K
@L
@K
1
1
1
1
1
1
£ L 2 ¡1 K 4 £ L + £ L 2 K 4 ¡1 £ K
2
4
1
1
1
1
1
1
£ L2 K 4 + £ L2 K 4
2
4
1
1
3
£ L2 K 4
4
3
£ F (L; K) :
4
Example 2: Euler’s theorem allows us to make predictions about a competitive
…rm’s pro…ts. Suppose Q = F (L; K) is homogeneous of degree k: Then by
Euler’s theorem:
kQ =
@F (L; K)
@F (L; K)
L+
K:
@L
@K
CHAPTER 4. MULTIVARIATE CALCULUS
206
A perfectly competitive …rm pro…t maximizes pro…ts by setting:
(L;K)
=R
and @F@K
P so that:
kQ =
@F (L;K)
@L
=
W
P
R
W
L + K =) kP Q = W L + RK
P
P
and so pro…ts ¼ are given by:
¼
= P Q ¡ (W L + RK) = P Q ¡ kP Q
= (1 ¡ k) P Q:
Thus if 0 < k < 1 (there are decreasing returns to scale) then ¼ > 0 while if
k = 1 (constant returns to scale) then ¼ = 0: If k > 1 then pro…ts must be
negative, which is indicative of the fact that increasing returns to scale are not
consistent with perfect competition.
Another useful calculus result for homogeneous functions is:
Theorem 244 If f (x1 ; x2 ; : : : xn ) is homogeneous of degree k then
@f (x1 ; x2 ; : : : xn )
@xi
is homogeneous of degree k ¡ 1:
1
1
Example: While Q = L 2 K 4 is homogeneous of degree k =
product of labour:
3
4
the marginal
@ ³ 1 1 ´ 1 ¡1 1
L2 K 4 = L 2 K 4
@L
2
is homogeneous of degree k ¡ 1 =
4.2.9
3
4
¡ 1 = ¡ 14 .
Homogeneity and the Absence of Money Illusion
Consider a demand function:
Q = Q (P; P1 ; P2 ; : : : Pn ; Y )
where P is the own price, P1 P2 ; : : : Pn are the prices of related goods and Y
is nominal income. Now suppose there is a general in‡ation so that all prices
and incomes increase by the same proportion ¸. For example suppose ¸ = 2 so
that all prices and incomes double. This means that real income and all real
prices have stayed the same and so a rational household, that is one that does
not su¤er from money illusion, will not change any of its real behavior and so
Q remains the same.
CHAPTER 4. MULTIVARIATE CALCULUS
207
Mathematically this means that:
Q (¸P; ¸P1 ; ¸P2 ; : : : ¸Pn ; ¸Y ) = Q (P; P1 ; P2 ; : : : Pn ; Y )
= ¸0 Q (P; P1 ; P2 ; : : : Pn ; Y ) :
Thus the absence of money illusion is equivalent to the demand function being
homogeneous of degree k = 0: This logic also applies to supply curves as well as
many other functions in economics.
4.2.10
Homogeneity and the Nature of Technology
In economics the nature of technology is often critical. What is often critical is
what happens as the scale of production is increased; is bigger better or is small
beautiful? This can be captured by the degree of homogeneity of the production
function.
Suppose that a production function
Q = F (L; K)
is homogeneous of degree k: If we double the size of operation, so that ¸ = 2
then:
F (2L; 2K) = 2k F (L; K) :
This says that doubling the scale of operation causes output to increase by a
factor of 2k : Now:
1
3
1. If k > 1 (e.g. F (L; K) = L 2 K 4 and k = 54 ) then doubling the scale leads
to more than twice the output since then 2k > 1: We then say that the
technology exhibits increasing returns to scale. Bigger is better.
1
1
2. If k = 1 (e.g. F (L; K) = L 2 K 2 and k = 1) then doubling the scale
leads to exactly twice the output since then 21 = 2: We then say that the
technology exhibits constant returns to scale.
1
1
3. If k < 1 (e.g. F (L; K) = L 2 K 4 and k = 34 ) then doubling the scale leads
to less than twice the output since then 2k < 2: We then say that the
technology exhibits decreasing returns to scale. Small is beautiful.
4.3
Second-Order Partial Derivatives
We are going to be interested in second-order partial derivatives when we discuss
the concavity, convexity and second-order conditions of multivariate functions.
We have:
De…nition 245 Given y = f (x1 ; x2 ; : : : xn ) the second-order partial derivative
with respect to xi and xj is:
µ
¶
@
@f (x1 ; x2 ; : : : xn )
@ 2 f (x1 ; x2 ; : : : xn )
=
:
@xj @xi
@xj
@xi
CHAPTER 4. MULTIVARIATE CALCULUS
208
Remark 1: If there are n xi 0 s then there are n …rst-order partial derivatives
and n2 second-order partial derivatives. For example the function y = f (x1 ; x2 )
has 2 …rst-order partial derivatives but 22 = 4 second-order partial derivatives:
@ 2 f (x1 ; x2 ) @ 2 f (x1 ; x2 ) @ 2 f (x1 ; x2 ) @ 2 f (x1 ; x2 )
;
;
;
:
@x1 @x1
@x1 @x2
@x2 @x1
@x2 @x2
Remark 2: The notation is usually a little di¤erent when we di¤erentiate twice
with respect to the same xi in which case we typically (but not always) write:
@ 2 f (x1 ; x2 ; : : : xn )
@x2i
and not
@ 2 f (x1 ; x2 ; : : : xn )
;
@xi @xi
that is instead of @xi @xi in the denominator one writes @x2i :
Example: Consider:
1
2
ln (x1 ) + ln (x2 ) ¡ x1 x2 :
3
3
We have 4 second-order partial derivatives:
µ
¶
@
@f (x1 ; x2 )
@ 2 f (x1 ; x2 )
=
@x21
@x1
@x1
µ
¶
@
1
1
¡ x2 = ¡ 2
=
@x1 3x1
3x1
µ
¶
@
@f (x1 ; x2 )
@ 2 f (x1 ; x2 )
=
@x2 @x1
@x2
@x1
µ
¶
@
1
¡ x2 = ¡1
=
@x2 3x1
µ
¶
@
@f (x1 ; x2 )
@ 2 f (x1 ; x2 )
=
@x1 @x2
@x1
@x2
µ
¶
2
@
¡ x1 = ¡1
=
@x1 3x2
µ
¶
@
@f (x1 ; x2 )
@ 2 f (x1 ; x2 )
=
@x22
@x2
@x2
µ
¶
@
2
2
¡ x1 = ¡ 2 :
=
@x2 3x2
3x2
y = f (x1 ; x2 ) =
Note that in this example
@ 2 f (x1 ; x2 )
@ 2 f (x1 ; x2 )
=
;
@x1 @x2
@x2 @x1
CHAPTER 4. MULTIVARIATE CALCULUS
209
that is we get the same result if we …rst di¤erentiate with respect to x1 and
then with respect to x2 as when we …rst di¤erentiate with respect to x2 and
then with respect to x1 : Alternatively we get the same result if we …rst apply
@
@
@
@
@x1 and then @x2 or if we …rst apply @x2 and then @x1 :
This turns is not a coincidence but is true for all functions. This very useful
result is known as: Young’s theorem.
Theorem 246 Young’s Theorem: Given y = f (x1 ; x2 ; : : : xn ) di¤erentiating
…rst with respect to xi and then with respect to xj gives the same result as
di¤erentiating …rst with respect to xj and then with respect to xi so that:
µ
¶
µ
¶
@
@f (x1 ; x2 ; : : : xn )
@
@f (x1 ; x2 ; : : : xn )
=
@xi
@xj
@xj
@xi
or:
@ 2 f (x1 ; x2 ; : : : xn )
@ 2 f (x1 ; x2 ; : : : xn )
=
:
@xi @xj
@xj @xi
Example: Given:
2
2
y = f (x1 ; x2 ) = e¡ 2 (x1 +x2 )
1
if we …rst di¤erentiate with respect to x1 we have :
´
2
2
2
2
2
2
1
1
1
@ 2 f (x1 ; x2 )
@ ³
@f (x1 ; x2 )
¡x1 e¡ 2 (x1 +x2 ) = x1 x2 e¡ 2 (x1 +x2 )
= ¡x1 e¡ 2 (x1 +x2 ) =)
=
@x1
@x2 @x1
@x2
while if we …rst di¤erentiate with respect to x2 we have :
´
2
2
2
2
2
2
1
1
1
@ 2 f (x1 ; x2 )
@ ³
@f (x1 ; x2 )
¡x2 e¡ 2 (x1 +x2 ) = x1 x2 e¡ 2 (x1 +x2 ) :
= ¡x2 e¡ 2 (x1 +x2 ) =)
=
@x2
@x1 @x2
@x1
Both yield the same result, as required by Young’s theorem.
4.3.1
The Hessian
A multivariate function y = f (x1 ; x2 ; : : : xn ) has a large number of second-order
partial derivatives. The best way to organize these n2 second derivatives is to
put them into an n £ n matrix called the Hessian, as de…ned below:
De…nition 247 Hessian: Given y = f (x1 ; x2 ; : : : xn ) = f (x)
n £ 1 vector the Hessian is:
2 @ 2 f (x)
@ 2 f (x)
@ 2 f (x)
¢ ¢ ¢ @x
@x1 @x2
@x21
1 @xn
6 @ 2 f (x)
@ 2 f (x)
@ 2 f (x)
6
¢
¢
¢
6 @x2 @x1
@x2 @xn
@x22
H (x1 ; x2 ; : : : xn ) = 6
..
..
..
..
6
.
.
.
.
4
@ 2 f (x)
@ 2 f (x)
@ 2 f (x)
¢¢¢
@xn @x1
@xn @x2
@x2
n
where x is an
3
7
7
7
7:
7
5
CHAPTER 4. MULTIVARIATE CALCULUS
210
Note that by Young’s theorem
@ 2 f (x)
@ 2 f (x)
=
@xi @xj
@xj @xi
and so the elements above the diagonal of the Hessian are equal to the corresponding elements below the diagonal and so:
Theorem 248 Matrix Version of Young’s Theorem: The Hessian: H (x1 ; x2 ; : : : xn )
is a symmetric matrix or:
H (x1 ; x2 ; : : : xn ) = H (x1 ; x2 ; : : : xn )T :
Remark: Young’s theorem reduces the number of di¤erent second-order
partial derivatives that we need to calculate from the n2 elements in the Hessian
to the n(n+1)
elements on and above the diagonal. For example if n = 4 rather
2
= 10
than calculating 42 = 16 second derivatives we need only calculate: 4(4+1)
2
di¤erent second derivatives.
Calculating the Elements of the Hessian
If you have trouble remembering how to construct the Hessian write down
a blank square matrix and along the top and left-side of the matrix make a list
of the xi 0 s as follows:
x1
x2
..
.
xn
x21
6
6
6 ..
4 .
x2
¢¢¢
¢¢¢
¢¢¢
x3n
7
7
7
5
:
To …ll in i; j th entry read the corresponding xi to the left and xj above and
di¤erentiate with respect to these two variables.
Example 1: For the function:
f (x1 ; x2 ) =
1
2
ln (x1 ) + ln (x2 ) ¡ x1 x2
3
3
the second derivatives are:
@ 2 f (x1 ; x2 )
@x21
@ 2 f (x1 ; x2 )
@x22
1 @ 2 f (x1 ; x2 )
@ 2 f (x1 ; x2 )
;
=
= ¡1;
2
3x1
@x2 @x1
@x1 @x2
2
= ¡ 2:
3x2
= ¡
To calculate the Hessian: H (x1 ; x2 ) one writes :
CHAPTER 4. MULTIVARIATE CALCULUS
·x1
x1
x2
x¸2
?
211
:
Thus for the 1; 2 element where the ? is placed, we di¤erentiate …rst with respect
to x1 (o¤ the left side of the 1; 2 element), and then with respect to x2 (directly
above the 1; 2 element). This then is
@ 2 f (x1 ; x2 )
= ¡1:
@x1 @x2
By Young’s theorem the 2; 1 and 1; 2 elements are identical and so we obtain:
x1
x2
· x1 x2 ¸
? ¡1
:
¡1
To obtain the 1; 1 element where the ? is now placed, we di¤erentiate …rst with
respect to x1 ; reading left, and then again with respect to x1 reading above
which is:
1
@ 2 f (x1 ; x2 )
=¡ 2
2
@x1
3x1
and so we now have:
x1
x2
·
x1 x2
¸
¡ 3x12 ¡1
1
:
¡1
?
To …nish our calculation we calculate the 2; 2 element where the ? is placed by
di¤erentiating twice with respect to x2 as:
2
@ 2 f (x1 ; x2 )
=¡ 2
@x22
3x2
and so the Hessian is given by:
H (x1 ; x2 ) =
"
¡ 3x12
1
¡1
¡1
¡ 3x22
2
#
:
Example 2: For the Cobb-Douglas production function:
1
1
Q = F (L; K) = L 2 K 4
CHAPTER 4. MULTIVARIATE CALCULUS
212
we calculate the Hessian by …rst listing L and K along the top and left-side of
a 2 £ 2 matrix as:
L
K
· L K¸
?
:
Thus for the 1; 2 element where the ? is placed we di¤erentiate once with respect
to L and once with respect to K yielding:
¶
µ
1
@
@ 2 F (L; K)
1
1
1
3
1
1 ¡ 12 14
@F (L; K)
= L¡ 2 K 4 =)
=
L K
= L¡ 2 K ¡ 4 :
@L
2
@L@K
@K 2
8
By Young’s theorem the 1; 2 and 2; 1 elements are identical and so we obtain:
L
K
·
L
?
3
1 ¡ 12
K¡ 4
8L
K
3
1 ¡ 12
K¡ 4
8L
¸
:
To …nd the 1; 1 element where the ? is placed we di¤erentiate twice with respect
to L so that:
¶
µ
1 ¡1 1
@ 2 F (L; K)
@
3
1
1
1 ¡ 12 14
@F (L; K)
2
4
= L K =)
L K
=
= ¡ L¡ 2 K 4
@L
2
@L2
@L 2
4
and so we obtain:
L
K
·
L
3
1
¡ 14 L¡ 2 K 4
3
1 ¡ 12
K¡ 4
8L
K
3
1 ¡ 12
K¡ 4
8L
?
¸
:
Finally to …nd the 2; 2 element where the ? is placed we di¤erentiate twice with
respect to K so that:
¶
µ
7
1 12 ¡ 34
@ 2 F (L; K)
@
3 1
1 12 ¡ 34
@F (L; K)
= L K
L K
=)
=
= ¡ L 2 K¡ 4
2
@K
4
@K
@K 4
16
so that the Hessian is given by:
· 1 ¡3 1
¡4L 2 K 4
H (L; K) =
3
1 ¡ 12
K¡ 4
8L
4.3.2
1 ¡ 12
¡ 34
8L 1K 7
3
¡ 16
L 2 K¡ 4
¸
:
Concavity and Convexity
In univariate calculus a function y = f (x) is concave (a mountain) if f 00 (x) < 0
and convex (a valley) if f 00 (x) > 0 where these mountains and valleys are in the
two dimensional space <2 .
CHAPTER 4. MULTIVARIATE CALCULUS
213
In multivariate calculus with n xi 0 s : y = f (x1 ; x2 ; : : : xn ) instead of f 00 (x)
we look at the n £ n Hessian: H (x1 ; x2 ; : : : xn ) to determine if a function is
concave (a mountain) or convex (a valley) in the n + 1 dimensional space <n+1 :
Let us start with some easy examples. Consider the multivariate function
f (x1 ; x2 ) = ¡
plotted below:
¢
1¡ 2
x1 + x22
2
y
x2
x1
:
¡
¢
f (x1 ; x2 ) = ¡ 12 x21 + x22
which is concave (a mountain) in 3 dimensions. You can verify that the Hessian
for this function is:
·
¸
¡1
0
H (x1 ; x2 ) =
0 ¡1
which is a negative de…nite matrix (since it is a diagonal matrix with negative
elements along the diagonal). Note the parallel: in univariate calculus a function
is concave if f 00 (x) is negative, in multivariate calculus a function is concave
if its Hessian H (x) is negative de…nite.
The function:
f (x1 ; x2 ) =
¢
1¡ 2
x1 + x22
2
CHAPTER 4. MULTIVARIATE CALCULUS
214
which is plotted below:
y
x2
x1
:
¡
¢
f (x1 ; x2 ) = 12 x21 + x22
is convex or a valley in 3 dimensions. You can verify that the Hessian for this
function is:
·
¸
1 0
H (x1 ; x2 ) =
0 1
which is a positive de…nite matrix (since it is diagonal with positive elements
along the diagonal). Again note the parallel: in univariate calculus a function
is convex if f 00 (x) is positive, in multivariate calculus a function is convex if
its Hessian H (x) is positive de…nite.
We have:
De…nition 249 Concavity: The function y = f (x1 ; x2 ; : : : xn ) is concave if
the Hessian: H (x1 ; x2 ; : : : xn ) is a negative de…nite matrix.
De…nition 250 Convexity: The function y = f (x1 ; x2 ; : : : xn ) is convex if
the Hessian: H (x1 ; x2 ; : : : xn ) is a positive de…nite matrix.
Remark: As before we can distinguish between local concavity and convexity
and global concavity and convexity. Thus
De…nition 251 Local Concavity: The function y = f (x1 ; x2 ; : : : xn ) is locally
concave
at a point:
x01 ; x02 ; : : : x0n if the Hessian evaluated at x01 ; x02 ; : : : x0n or
¢
¡
H x01 ; x02 ; : : : x0n is a negative de…nite matrix.
De…nition 252 Local Convexity: The function y = f (x1 ; x2 ; : : : xn ) is lo0 0
0
0 0
0
cally
¡ convex at a¢ point: x1 ; x2 ; : : : xn if the Hessian evaluated at x1 ; x2 ; : : : xn or
H x01 ; x02 ; : : : x0n is a positive de…nite matrix.
De…nition 253 Global Concavity: The function y = f (x1 ; x2 ; : : : xn ) is globally concave if the Hessian: H (x1 ; x2 ; : : : xn ) is a negative de…nite matrix for
all x1 ; x2 ; : : : xn in the domain of f (x1 ; x2 ; : : : xn ) :
CHAPTER 4. MULTIVARIATE CALCULUS
215
De…nition 254 Global Convexity: The function y = f (x1 ; x2 ; : : : xn ) is globally convex if the Hessian: H (x1 ; x2 ; : : : xn ) is a positive de…nite matrix for all
x1 ; x2 ; : : : xn in the domain of f (x1 ; x2 ; : : : xn ) :
Example 1: We have seen from a previous example that the function
y = f (x1 ; x2 ) =
1
2
ln (x1 ) + ln (x2 ) ¡ x1 x2
3
3
has a Hessian
H (x1 ; x2 ) =
"
¡ 3x12
1
¡1
¡1
¡ 3x22
2
#
:
The function is locally concave at x01 = 12 and x02 = 13 since:
3
2
µ
¶
¡1
¡ 11 2
1 1
3( 2 )
5
H
;
= 4
¡1 ¡ 21 2
2 3
3( 3 )
· 4
¸
¡ 3 ¡1
=
¡1 ¡6
is a negative de…nite matrix as can be shown from the leading principal minors
where M1 = ¡ 43 < 0 and M2 = 7 > 0; or from the eigenvalues which are
¸1 = ¡1:13 < 0 and ¸2 = ¡6:21 < 0.
The function is not globally concave since at another point where x01 = x02 =
1:
· 1
¸
¡ 3 ¡1
H (1; 1) =
¡1 ¡ 23
which is not negative de…nite since this requires that both eigenvalues be negative but:
¸1
¸1
1
= ¡ +
2
1
= ¡ ¡
2
1p
37 = 0:513 > 0
6
1p
37 = ¡1:514 < 0:
6
Example 2: Consider the Cobb-Douglas production function:
1
1
Q = L2 K 4
CHAPTER 4. MULTIVARIATE CALCULUS
216
plotted below:
3
Q
2
1
0
0
0
L
K
5 5
1
2
Q=L K
:
1
4
From the graph the function appears mountain-like or concave. To verify this
let us look at the Hessian calculated in a previous example:
¸
· 1 ¡3 1
1 ¡ 12
¡ 34
¡4L 2 K 4
8L 1K 7
:
H (L; K) =
1
3
1 ¡2
3
K ¡ 4 ¡ 16
L 2 K¡ 4
8L
The …rst leading principal minor M1 is negative for all L and K since:
3
1
1
M1 = ¡ L¡ 2 K 4 < 0:
4
The second leading principal minor M2 is positive since:
· 1 ¡3 1
3 ¸
1 ¡ 12
¡4L 2 K 4
L K¡ 4
8
M2 = det 1 ¡ 1 ¡ 3
1
7
3
2K
4
¡ 16
L 2 K¡ 4
8L
1
3
3 ¡1 ¡ 32
L K
¡ L¡1 K ¡ 2
=
64
64
1 ¡1 ¡ 3
L K 2 > 0:
=
32
It follows that H (L; K) is a negative de…nite matrix for all L and K and
1
1
hence that Q = L 2 K 4 is globally concave.
You may want to attempt to prove the following results:
Theorem 255 The Cobb-Douglas production function:
f (L; K) = L® K ¯
with ® > 0 and ¯ > 0 is globally concave if and only if ® + ¯ < 1:
Theorem 256 f (x1 ; x2 ; : : : xn ) is globally concave (convex) if and only if
g (x1 ; x2 ; : : : xn ) = ¡1 £ f (x1 ; x2 ; : : : xn )
is globally convex (concave).
CHAPTER 4. MULTIVARIATE CALCULUS
217
Theorem 257 If f (x1 ; x2 ; : : : xn ) is globally concave (convex) and g (x1 ; x2 ; : : : xn )
is globally concave (convex) then
h (x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + g (x1 ; x2 ; : : : xn )
is globally concave (convex).
4.3.3
First and Second-Order Taylor Series
We say in univariate calculus that Taylor series can be used to approximate an
arbitrary function by a linear function or a quadratic. Similar results apply for
multivariate functions. In particular:
Theorem 258 If x is an n £ 1 vector and f (x) is a multivariate function, a
…rst-order Taylor series of f (x) around the point x0 is given by:
¡ ¢T
¡ ¢
f x0 + rf x0 (x ¡ x0 )
while a second-order Taylor series around x0 is given by:
¡ ¢T
¡ ¢
1
f x0 + rf x0 (x ¡ x0 ) + (x ¡ x0 )T H (x0 ) (x ¡ x0 )
2
where rf (x) is the gradient and H (x) the Hessian of f (x) :
Example: Given:
f (x1 ; x2 ) = x51 x32
and suppose we wish to calculate a Taylor series approximation around x01 =
1and x02 = 2. We then have: f (1; 2) = (1)5 (2)3 = 8 and
@f (x1 ; x2 )
@x1
@f (x1 ; x2 )
@x2
@f (1; 2)
= 5 (1)4 (2)2 = 20
@x1
@f (1; 2)
= 2x51 x12 =)
= 2 (1)5 (2)1 = 4
@x2
= 5x41 x22 =)
so that the gradient at (1; 2) is
¡ ¢
rf x0 ´ rf (1; 2) =
·
20
4
¸
and a …rst-order Taylor series around x01 = 1 and x02 = 2 would be
¸ · ¸¶
µ·
¡ ¢T
¡ ¢
£
¤
1
x1
f x0 + rf x0 (x ¡ x0 ) = 8 + 20 4
¡
2
x2
= 8 + 20 £ (x1 ¡ 1) + 4 £ (x2 ¡ 2) :
CHAPTER 4. MULTIVARIATE CALCULUS
218
To calculate a second-order Taylor series we need the second derivatives:
@ 2 f (x1 ; x2 )
@x21
2
@ f (x1 ; x2 )
@x1 @x2
@ 2 f (x1 ; x2 )
@x22
@ 2 f (1; 2)
= 20 (1)3 (2)2 = 80
@x21
@ 2 f (1; 2)
= 15x41 x22 =)
= 15 (1)4 (2)2 = 60
@x1 @x2
@ 2 f (1; 2)
= 6x51 x2 =)
= 6 (1)5 (2)1 = 12
@x21
= 20x31 x22 =)
and so that the Hessian at (1; 2) is:
H (1; 2) =
·
80 60
60 12
¸
:
Thus the second-order Taylor series is:
¡ ¢T
¡ ¢
1
f x0 + rf x0 (x ¡ x0 ) + (x ¡ x0 )T H (x0 ) (x ¡ x0 )
2
= 8 + 20 (x1 ¡ 1) + 4 (x2 ¡ 2)
¸
·
¸·
¤ 80 60
1£
x1 ¡ 1
x1 ¡ 1 x2 ¡ 2
+
x2 ¡ 2
60 12
2
= 8 + 20 £ (x1 ¡ 1) + 4 £ (x2 ¡ 2)
+40 (x1 ¡ 1)2 + 6 (x2 ¡ 2)2 + 60 (x1 ¡ 1) (x2 ¡ 2) :
4.4
4.4.1
Unconstrained Optimization
First-Order Conditions
The …rst-order conditions for a maximum or minimum of a function of n variables:
y = f (x1 ; x2 ; : : : xn )
are:
Theorem 259 First-Order Conditions: If x¤1 ; x¤2 ; : : : x¤n maximizes or minimizes the function y = f (x1 ; x2 ; : : : xn ) then:
@f (x¤1 ; x¤2 ; : : : x¤n )
@f (x¤1 ; x¤2 ; : : : x¤n )
@f (x¤1 ; x¤2 ; : : : x¤n )
= 0;
= 0; ¢ ¢ ¢ ;
= 0:
@x1
@x2
@xn
Proof. (by contradiction): Suppose x¤1 ; x¤2 ; : : : x¤n maximizes (minimizes) y
¤
¤
@f (x¤
1 ;x2 ;:::xn )
> 0: It follows then at x¤1 ; x¤2 ; : : : x¤n
and suppose it were the case that
@xi
that y is a locally increasing function of xi so that increasing (decreasing) xi
and keeping all other variables …xed would increase (decrease) y: This however
CHAPTER 4. MULTIVARIATE CALCULUS
219
contradicts x¤1 ; x¤2 ; : : : x¤n being a maximum (minimum) and so
¤
¤
@f (x¤
1 ;x2 ;:::xn )
@xi
¤
¤
@f (x¤
1 ;x2 ;:::xn )
@xi
>0
< 0 then y is a locally decreasing funcis not possible. Similarly if
tion of xi and so if we decreased (increased) xi keeping all other variables …xed
then y would increase (decrease). Again this contradicts x¤1 ; x¤2 ; : : : x¤n being a
¤
¤
@f (x¤
1 ;x2 ;:::xn )
< 0 is not possible. It follows then
maximum (minimum) and so
@xi
that:
@f (x¤1 ; x¤2 ; : : : x¤n )
= 0:
@xi
We can also express the …rst-order conditions more compactly using the
(x)
gradient rf (x) ´ @f@x
evaluated at x¤ so that:
Theorem 260 If the n £ 1 vector x¤ maximizes or minimizes y = f (x) then:
rf (x¤ ) ´
@f (x¤ )
= 0:
@x
where 0 is an n £ 1 vector of zeros.
Remark: The …rst-order conditions for a maximum or minimum involve n equations in n unknowns x¤1 ; x¤2 ; : : : x¤n . If the problem is “nice” then it is sometimes
possible to solve these n equations for the n unknown: x¤1 ; x¤2 ; : : : x¤n : Even when
we cannot explicitly solve these equations we can often learn a lot about the
nature of the solution by examining the …rst-order conditions.
Although …nding the …rst-order conditions is generally straightforward, there
are a few pitfalls that students can avoid by using the following recipe:
Deriving the First-Order Conditions
1. Calculate the n …rst-order partial derivatives
2. Put
¤ 0
@f (x1 ;x2 ;:::xn )
@xi
for i = 1; 2; : : : n:
s on all the xi 0 s in 1 and set each partial derivative equal to zero.
3. If possible solve for x¤1 ; x¤2 ; : : : x¤n or if not examine the …rst-order conditions
for anything you can learn about the optimal values.
Example 1: Consider the function:
f (x1 ; x2 ) = 3x21 ¡ 6x1 x2 + 5x22 ¡ 4x1 ¡ 2x2 + 8:
1. Calculating the …rst derivatives we …nd:
@f (x1 ; x2 )
@x1
@f (x1 ; x2 )
@x2
= 6x1 ¡ 6x2 ¡ 4
= ¡6x1 + 10x2 ¡ 2:
CHAPTER 4. MULTIVARIATE CALCULUS
220
2. Putting ¤ 0 s on the xi 0 s in 1: and setting these derivatives equal to 0
results in:
@f (x¤1 ; x¤2 )
@x1
@f (x¤1 ; x¤2 )
@x2
= 6x¤1 ¡ 6x¤2 ¡ 4 = 0
= ¡6x¤1 + 10x¤2 ¡ 2 = 0:
3. We can solve the …rst-order conditions since in matrix notation we obtain:
·
¸· ¤ ¸ · ¸
6 ¡6
x1
4
=
:
¡6 10
2
x¤2
So that using Cramer’s rule we …nd that:
·
¸
·
4 ¡6
6
det
det
2
10
¡6
13
·
¸ = ; x¤2 =
·
x¤1 =
6
6 ¡6
6
det
det
¡6 10
¡6
¸
4
2
3
¸= :
2
¡6
10
Example 2: Consider the function:
y = f (x1 ; x2 ) =
1
2
ln (x1 ) + ln (x2 ) ¡ x1 ¡ x2 :
3
3
Following the recipe we have:
1. The …rst derivatives are:
@f (x1 ; x2 )
@x1
@f (x1 ; x2 )
@x2
2. Putting
¤ 0
1
¡ 1;
3x1
2
¡ 1:
3x2
=
=
s on the xi 0 s in 1: and setting these derivatives equal to 0:
@f (x¤1 ; x¤2 )
@x1
@f (x¤1 ; x¤2 )
@x2
3. Solving we …nd that: x¤1 =
1
3
=
=
1
¡ 1 = 0;
3x¤1
2
¡ 1 = 0:
3x¤2
and x¤2 = 23 :
Example 3: Consider the function:
1=4 1=2
f (x1 ; x2 ) = x1 x2
¡ 3x1 ¡ 2x2 ;
where x1 > 0 and x2 > 0: Following the recipe we have:
CHAPTER 4. MULTIVARIATE CALCULUS
1. The …rst derivatives are:
@f (x1 ; x2 )
@x1
@f (x1 ; x2 )
@x2
2. Putting
¤ 0
=
=
221
1 ¡3=4 1=2
x
x2 ¡ 3
4 1
1 1=4 ¡1=2
x x
¡ 2:
2 1 2
s on the xi 0 s in 1 and setting these derivatives equal to 0:
@f (x¤1 ; x¤2 )
@x1
@f (x¤1 ; x¤2 )
@x2
=
=
1 ¤ ¡3=4 ¤ 1=2
(x )
(x2 ) ¡ 3 = 0
4 1
1 ¤ 1=4 ¤ ¡1=2
(x ) (x2 )
¡2 =0
2 1
3. We now attempt to solve the equations in 2 using the ln ( ) function to
convert them into two linear equations as:
(x¤1 )
¡3=4
(x¤1 )
1=4
(x¤2 )1=2
(x¤2 )¡1=2
3
1
= 12 =) ¡ ln (x¤1 ) + ln (x¤2 ) = ln (12)
4
2
1
1
= 4 =) ln (x¤1 ) ¡ ln (x¤2 ) = ln (4)
4
2
and hence:
1
3
¡ y1¤ + y2¤ = ln (12)
4
2
1 ¤ 1 ¤
y ¡ y = ln (4)
4 1 2 2
where y1¤ = ln (x¤1 ) and y2¤ = ln (x¤2 ) : Writing this in matrix notation we
obtain:
¸· ¤ ¸ ·
¸
· 3
1
y1
ln (12)
¡4
2
=
:
1
¡ 12
ln (4)
y2¤
4
Using Cramer’s rule we …nd that:
·
¸
1
ln (12)
2
det
ln (4) ¡ 12
· 3
¸ = ¡2 ln (12) ¡ 2 ln (4) ;
ln (x¤1 ) = y1¤ =
1
¡4
2
det
1
¡ 12
¸
· 34
¡ 4 ln (12)
det
1
ln (4)
· 43
¸ = ¡3 ln (4) ¡ ln (12) :
ln (x¤2 ) = y2¤ =
1
¡4
2
det
1
¡ 12
4
Thus
¤
1
2304
1
:
=
768
x¤1
= ey1 = e¡2 ln(12)¡2 ln(4) =
x¤2
= ey2 = e¡3 ln(4)¡ln(12)
¤
CHAPTER 4. MULTIVARIATE CALCULUS
222
Example 4: Suppose a perfectly competitive …rm has a technology: Q =
F (L; K) which is globally concave. Pro…ts expressed as a function of L and K
are given by:
¼ (L; K) = P F (L; K) ¡ W L ¡ RK:
Following the recipe we have:
1. The …rst derivatives are:
@¼ (L; K)
@L
@¼ (L; K)
@K
@F (L; K)
¡W
@L
@F (L; K)
= P
¡ R:
@K
= P
2. Putting ¤ 0 s on L and K in 1 and setting these derivatives equal to 0
results in:
@¼ (L¤ ; K ¤ )
@L
@¼ (L¤ ; K ¤ )
@K
@F (L¤ ; K ¤ )
¡W =0
@L
@F (L¤ ; K ¤ )
= P
¡ R = 0:
@K
= P
3. Given the level of generality there is no hope of explicitly solving for L¤ and
K ¤ here. We can nevertheless learn something about how a competitive
…rm chooses L¤ and K since:
W
@F (L¤ ; K ¤ )
¡ W = 0 =) MPL (L¤ ; K ¤ ) =
´w
@L
P
@F (L¤ ; K ¤ )
R
¡ R = 0 =) MPK (L¤ ; K ¤ ) =
´r
P
@K
P
P
where w and r are the real wage rate and real rental cost of capital. Thus
the competitive …rm chooses L¤ and K ¤ to equate the marginal products
of labour and capital with the real wage w and the real rental cost of
capital.
Example 5: Consider pro…t maximization in the long-run with a Cobb-Douglas
production function:
1
1
Q = F (L; K) = L 2 K 4
with P = 8; W = 5 and R = 4. The pro…t function is then:
¼ (L; K) = P F (L; K) ¡ W L ¡ RK
or:
1
1
¼ (L; K) = 8L 2 K 4 ¡ 5L ¡ 4K:
Following the recipe we have:
CHAPTER 4. MULTIVARIATE CALCULUS
223
1. The …rst derivatives are:
@¼ (L; K)
@L
@¼ (L; K)
@K
2. Putting
in:
¤ 0
1
1
= 4L¡ 2 K 4 ¡ 5
1
3
= 2L 2 K ¡ 4 ¡ 4:
s on L and K and setting these derivatives equal to 0 results
@¼ (L¤ ; K ¤ )
@L
@¼ (L¤ ; K ¤ )
@K
1
1
1
1
= 4L¤¡ 2 K ¤ 4 ¡ 5 = 0 =) L¤¡ 2 K ¤ 4 =
1
3
1
5
4
3
= 2L¤ 2 K ¤¡ 4 ¡ 4 = 0 =) L¤ 2 K ¤¡ 4 = 2:
3. We now attempt to solve using the ln ( ) function to convert these equations into two linear equations as:
µ ¶
1
5
1
5
1
1
=) ¡ ln (L¤ ) + ln (K ¤ ) = ln
L¤¡ 2 K ¤ 4 =
4
2
4
4
1
3
1
3
L¤ 2 K ¤¡ 4 = 2 =) ln (L¤ ) ¡ ln (K ¤ ) = ln (2)
2
4
or in matrix notation as
·
¡ 12
1
2
1
4
¡ 34
¸·
¤
l
k¤
¸
2
=4
ln
¡5¢ 3
4
ln (2)
5
where l¤ = ln (L¤ ) and k¤ = ln (K ¤ ) : Solving these two equations by
matrix inversion (or by using Cramer’s rule) we obtain:
¡ ¢
3
2
· ¤ ¸ · 1
¸¡1 · ¡ 5 ¢ ¸
¡3 ln 54 ¡ ln (2)
1
l
¡2
ln 4
4
5:
=4
=
1
¡5¢
k¤
¡ 34
ln (2)
2
¡2 ln 4 ¡ 2 ln (2)
Thus
4.4.2
32
125
4
= :
25
L¤
= e¡3 ln( 4 )¡ln(2) =
K¤
= e¡2 ln( 4 )¡2 ln(2)
5
5
Second-Order Conditions
A solution x¤1 ; x¤2 ; : : : x¤n to the …rst-order conditions can be either a maximum or
a minimum. Clearly we want to be able to know if x¤1 ; x¤2 ; : : : x¤n is a maximum
or a minimum. For example if we are interested in pro…t maximization we do
not want to be at a point which minimizes pro…ts.
CHAPTER 4. MULTIVARIATE CALCULUS
224
As with univariate calculus, the second-order conditions rely on the fact that
a maximum occurs at the top of a mountain (concavity) while a minimum occurs
at the bottom of a valley (convexity). We therefore use the matrix of second
derivatives or the Hessian: H (x1 ; x2 ; : : : xn ) to determine if x¤1 ; x¤2 ; : : : x¤n is the
a maximum or a minimum. As before the weaker condition of local concavity
(convexity) yields the weaker result of a local maximum (minimum) while the
stronger condition of global concavity (convexity) yields the stronger result of
a global maximum (minimum). We have:
Theorem 261 Local Maximum: If x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and f (x1 ; x2 ; : : : xn ) is locally concave at x¤1 ; x¤2 ; : : : x¤n so that: H (x¤1 ; x¤2 ; : : : x¤n )
is negative de…nite, then x¤1 ; x¤2 ; : : : x¤n is a local maximum of f (x1 ; x2 ; : : : xn ) :
Theorem 262 Local Minimum: If x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and f (x1 ; x2 ; : : : xn ) is locally convex at x¤1 ; x¤2 ; : : : x¤n so that: H (x¤1 ; x¤2 ; : : : x¤n )
is positive de…nite, then x¤1 ; x¤2 ; : : : x¤n is a local minimum of f (x1 ; x2 ; : : : xn ) :
Theorem 263 If x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and f (x1 ; x2 ; : : : xn )
is globally concave so that H (x1 ; x2 ; : : : xn ) is a negative de…nite matrix for all
x1 ; x2 ; : : : xn ; then x¤1 ; x¤2 ; : : : x¤n is the unique global maximum of f (x1 ; x2 ; : : : xn ) :
Theorem 264 If x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and f (x1 ; x2 ; : : : xn )
is globally convex so that H (x1 ; x2 ; : : : xn ) is a positive de…nite matrix for all
x1 ; x2 ; : : : xn ; then x¤1 ; x¤2 ; : : : x¤n is the unique global minimum of f (x1 ; x2 ; : : : xn ) :
Example 1 (continued): For the function:
f (x1 ; x2 ) = 3x21 ¡ 6x1 x2 + 5x22 ¡ 4x1 ¡ 2x2 + 8:
3
¤
we showed that x¤1 = 13
6 x2 = 2 is a solution to the …rst-order conditions. To
determine if this is a maximum or a minimum we need:
@ 2 f (x1 ; x2 )
@x21
2
@ f (x1 ; x2 )
@x2 @x1
@ 2 f (x1 ; x2 )
@x22
=
=
=
@
(6x1 ¡ 6x2 ¡ 4) = 6
@x1
@
(6x1 ¡ 6x2 ¡ 4) = ¡6
@x2
@
(¡6x1 + 10x2 ¡ 2) = 10
@x2
so that the Hessian of f (x1 ; x2 ) is:
H (x1 ; x2 ) =
·
6 ¡6
¡6 10
¸
:
Note for this particular problem the Hessian does not depend on x1 and x2 :
CHAPTER 4. MULTIVARIATE CALCULUS
225
Now from the leading principal minors we have:
M1
M1
= det [6] = 6 > 0
·
¸
6 ¡6
= det
= 24 > 0:
¡6 10
It follows then that H (x1 ; x2 ) is positive de…nite for all x1 ; x2 and hence f (x1 ; x2 )
3
¤
is globally convex. Therefore x¤1 = 13
6 x2 = 2 is a global minimum.
Example 2 (continued): For the function:
y = f (x1 ; x2 ) =
1
2
ln (x1 ) + ln (x2 ) ¡ x1 ¡ x2
3
3
1
3
we showed that the solution to the …rst-order conditions is: x¤1 =
The Hessian is given by:
"
#
¡ 3x12
0
1
H (x1 ; x2 ) =
:
0 ¡ 3x22
and x¤2 = 23 :
2
At x¤1 =
1
3
and x¤2 =
2
3
we have:
2
µ
¶
¡ 11 2
1 2
3( 3 )
4
;
H
=
0 ¡
3 3
0
2
2
3( 23 )
3
5=
·
¡3
0
0 ¡ 32
¸
:
which is a negative de…nite matrix (since M1 = ¡3 < 0 and M2 = 92 > 0 ) so
that it follows that x¤1 = 13 and x¤2 = 23 is a local maximum.
We can in fact make the stronger statement that x¤1 = 13 and x¤2 = 23 is a
global maximum since the Hessian is a negative de…nite matrix for all x1 ; x2
since it is a diagonal matrix with negative elements along the diagonal. Thus:
x¤1 = 13 and x¤2 = 23 is a global maximum.
Example 3 (continued): For the function:
1=4 1=2
f (x1 ; x2 ) = x1 x2
¡ 3x1 ¡ 2x2 ;
we showed that the solution to the …rst-order conditions is x¤1 =
We have:
1
¤
2304 ; x2
=
1
768 :
3 ¡7=4 1=2 @ 2 f (x1 ; x2 )
1 1=4 ¡3=2 @ 2 f (x1 ; x2 )
1 ¡3=4 ¡3=2
@ 2 f (x1 ; x2 )
= ¡ x1 x2 ;
= ¡ x1 x2 ;
= x1 x2
2
2
@x1
16
@x2
4
@x1 @x2
8
so that the Hessian is given by:
"
3 ¡7=4 1=2
¡ 16
x1 x2
H (x1 ; x2 ) =
1 ¡3=4 ¡3=2
x
x2
8 1
1 ¡3=4 ¡3=2
x2
8 x1
1=4 ¡3=2
¡ 14 x1 x2
#
:
CHAPTER 4. MULTIVARIATE CALCULUS
Substituting x¤1 =
H
µ
1
1
;
2304 768
¶
=
=
1
2304
"
·
and x¤2 =
into H (x1 ; x2 ) we …nd that:
¢¡7=4 ¡ 1 ¢1=2
1
2304
¢¡3=4 ¡ 1 768
¢¡1=2
1
2304
768
3
¡ 16
¡
1
8
1
768
226
¡
¡5184
1152
1152 ¡58982
¸
:
¢¡3=4 ¡ 1 ¢¡1=2
1
2304
¡ 1 ¢1=4 ¡ 768
¢
1 ¡5=2
¡ 14 2304
768
1
8
¡
#
This matrix is negative de…nite from the leading principal minors since:
M1
M2
= ¡5184 < 0
·
¸
¡5184
1152
= det
= 304435584 > 0:
1152 ¡58982
1
1
It follows then that x¤1 = 2304
, x¤2 = 768
is a local maximum.
1
1
¤
and x¤2 = 768
is a global maximum
We can in fact prove more, that x1 = 2304
by showing that f (x1 ; x2 ) is globally concave. To do this we need to show that
H (x1 ; x2 ) is negative de…nite for all x1 ; x2 : Using leading principal minors we
have:
3 ¡7=4 1=2
x
x2 < 0
16" 1
#
3 ¡7=4 1=2
1 ¡3=4 ¡3=2
¡ 16
x1 x2
x
x
2
8 1
= det
1 ¡3=4 ¡3=2
1 1=4 ¡5=2
¡
x
x
x
x
1
1
2
2
8
4
1 ¡6=4 ¡4=2
3 ¡6=4 ¡4=2
x
¡ x1 x2
x2
=
64 1
64
2 ¡6=4 ¡4=2
x
=
x2
>0
64 1
M1
= ¡
M2
so that H (x1 ; x2 ) is a negative de…nite matrix for all x1 and x2 : Thus f (x1 ; x2 )
1
1
; x¤2 = 768
is globally concave and x¤1 = 2304
is the unique global maximum.
Example 4 (continued): Given the problem of maximizing pro…ts:
¼ (L; K) = P F (L; K) ¡ W L ¡ RK
where the production function: Q = F (L; K) is globally concave so that its
Hessian:
" 2
#
2
HF (L; K) =
@ F (L;K)
@L2
@ 2 F (L;K)
@L@K
@ F (L;K)
@L@K
@ 2 F (L;K)
@K 2
is negative de…nite for all L and K: The Hessian of the pro…t function is then:
" 2
# "
#
2
2
@ ¼(L;K)
@ 2 ¼(L;K)
(L;K)
F (L;K)
P @ F@L
P @ @L@K
2
2
@L
@L@K
=
H¼ (L; K) =
2
2
@ 2 ¼(L;K)
@ 2 ¼(L;K)
F (L;K)
(L;K)
P @ @L@K
P @ F@K
2
@L@K
@K 2
" 2
#
2
= P
@ F (L;K)
@L2
@ 2 F (L;K)
@L@K
@ F (L;K)
@L@K
@ 2 F (L;K)
@K 2
= P HF (L; K) :
CHAPTER 4. MULTIVARIATE CALCULUS
227
Since HF (L; K) is negative de…nite for all (L; K) (since F (L; K) is concave),
and since P > 0; it follows that H¼ (L; K) is also negative de…nite for all (L; K).
Thus ¼ (L; K) is globally concave and hence the L¤ ; K ¤ which solves the …rstorder conditions is the unique global maximum.
Example 5: Consider the long-run pro…t maximization problem with:
1
1
¼ (L; K) = 8L 2 K 4 ¡ 5L ¡ 4K:
32
4
; K ¤ = 25
:
We showed that the solution to the …rst-order conditions is: L¤ = 125
We would like to show that this is in fact a global pro…t maximum. The Hessian
of ¼ (L; K) is:
·
3
1
1
3 ¸
¡2L¡ 2 K 4
L¡ 2 K ¡ 4
H (L; K) =
:
1
3
1
7
L¡ 2 K ¡ 4 ¡ 32 L 2 K ¡ 4
Using leading principal minors we have:
M1
M2
3
1
= ¡2L¡ 2 K 4 < 0
3
1
3
3
= 3L¡1 K 2 ¡ L¡ 2 K ¡ 4 = 2L¡1 K 2 > 0:
Thus H (L; K) is negative de…nite for all L and K so that ¼ (L; K) is globally
32
4
and K ¤ = 25
is a global maximum.
concave and hence: L¤ = 125
4.5
4.5.1
Quasi-Concavity and Quasi-Convexity
Ordinal and Cardinal Properties
Just as with univariate functions, multivariate functions have both ordinal and
cardinal properties de…ned in exactly the same manner:
De…nition 265 Ordinal Property: An ordinal property of a function f (x1 ; x2 ; : : : xn )
is one which remains unchanged when any monotonic transformation g (x) is
applied; that is both f (x1 ; x2 ; : : : xn ) and g (f (x1 ; x2 ; : : : xn )) share the property
for any g (x) with: g0 (x) > 0:
De…nition 266 Cardinal Property: A cardinal property of a function f (x1 ; x2 ; : : : xn )
is one which does change a monotonic transformation is applied.
Just as with univariate functions, the global maximum or minimum is an
ordinal property while concavity or convexity is a cardinal property.
Theorem 267 A Global Maximum or Minimum: x¤1 ; x¤2 ; : : : x¤n is an
Ordinal Property; that is x¤1 ; x¤2 ; : : : x¤n is a global maximum or minimum
of f (x1 ; x2 ; : : : xn ) if and only if it is also a global maximum or minimum of
g (f (x1 ; x2 ; : : : xn )) with g 0 (x) > 0:
CHAPTER 4. MULTIVARIATE CALCULUS
228
Theorem 268 Concavity and Convexity are Cardinal Properties: If
f (x1 ; x2 ; : : : xn ) is globally concave or convex it does not follow that g (f (x1 ; x2 ; : : : xn ))
(with g 0 (x) > 0) is globally concave or convex.
Again this leads us to de…ne a weaker notion of concavity or convexity which
is an ordinal property:
De…nition 269 Quasi-Concavity: A function f (x1 ; x2 ; : : : xn ) is quasi-concave
if and only if it can be written as:
f (x1 ; x2 ; : : : xn ) = g (h (x1 ; x2 ; : : : xn ))
with g 0 (x) > 0 and where h (x1 ; x2 ; : : : xn ) is globally concave.
De…nition 270 Quasi-Convexity: A function f (x1 ; x2 ; : : : xn ) is quasi-convex
if and only if it can be written as:
f (x1 ; x2 ; : : : xn ) = g (h (x1 ; x2 ; : : : xn ))
with g 0 (w) > 0 and h (x1 ; x2 ; : : : xn ) is globally convex.
We have:
Theorem 271 Quasi-Concavity and Quasi-Convexity are Ordinal Properties.
Just as with univariate functions, one can show that a given function is
quasi-concave (quasi-convex) by …nding a monotonic transformation g (x) which
makes the function concave (convex). Thus:
Theorem 272 A function f (x1 ; x2 ; : : : xn ) is quasi-concave (quasi-convex) if
and only if there exists a monotonic transformation g (x) such that
h (x1 ; x2 ; : : : xn ) = g (f (x1 ; x2 ; : : : xn ))
is globally concave (globally convex).
Example: Consider the function:
f (x1 ; x2 ) = x21 x42
for x1 > 0 and x2 > 0: You can verify that the Hessian Hf (x1 ; x2 ) of f (x1 ; x2 )
is given by:
·
¸
2x42
8x1 x32
Hf (x1 ; x2 ) =
:
8x1 x32 12x21 x22
CHAPTER 4. MULTIVARIATE CALCULUS
229
The function x21 x42 is not concave since the diagonal elements of Hf (x1 ; x2 )
are both positive. The function x21 x42 is also not convex since H (x1 ; x2 ) is not
positive de…nite since:
·
¸
2x42
8x1 x32
M2 = det [Hf (x1 ; x2 )] = det
8x1 x32 12x21 x22
= ¡40x62 x21 < 0:
We can however show that x21 x42 is quasi-concave. We will do this two
di¤erent ways. First we show that f (x1 ; x2 ) is a monotonic function of a concave
function since:
f (x1 ; x2 ) = x21 x42 = e2 ln(x1 )+4 ln(x2 )
so that the monotonic transformation is g (x) = ex (with g 0 (x) = ex > 0 ) and
the concave function is:
h (x1 ; x2 ) = 2 ln (x1 ) + 4 ln (x2 )
since the Hessian of h (x1 ; x2 ) is:
Hh (x1 ; x2 ) =
"
¡ x22
1
0
0
¡ x42
2
#
which is negative de…nite for all (x1 ; x2 ) (since it is a diagonal matrix with
negative diagonal elements). We conclude then that x21 x42 is quasi-concave.
Now we show that f (x1 ; x2 ) is quasi-concave by …nding a monotonic transformation g (x) which transforms f (x1 ; x2 ) into a concave function. To this end
let:
g (x) = ln (x) with g 0 (x) =
1
>0
x
so that:
¢
¡
h (x1 ; x2 ) = g (f (x1 ; x2 )) = ln x21 x42 = 2 ln (x1 ) + 4 ln (x2 ) :
We have already shown that h (x1 ; x2 ) is globally concave and so the quasiconcavity of x21 x42 follows.
4.5.2
Su¢cient Conditions for a Global Maximum or Minimum
To insure that a solution to the …rst-order conditions x¤1 ; x¤2 ; : : : x¤n is a global
maximum (minimum) we do not necessarily need concavity (convexity), we only
need the weaker conditions of quasi-concavity (quasi-convexity). In particular:
Theorem 273 Suppose that x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and
f (x1 ; x2 ; : : : xn ) is quasi-concave, then x¤1 ; x¤2 ; : : : x¤n is the unique global maximum of f (x1 ; x2 ; : : : xn ) :
CHAPTER 4. MULTIVARIATE CALCULUS
230
Theorem 274 Suppose that x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and
f (x1 ; x2 ; : : : xn ) is quasi-convex, then x¤1 ; x¤2 ; : : : x¤n is the unique global minimum
of f (x1 ; x2 ; : : : xn ) :
Example: Consider a scaled version of the bivariate standard normal distribution:
2
2
f (x1 ; x2 ) = e¡ 2 (x1 +x2 ) :
1
To …nd the …rst-order conditions we …rst calculate:
@f (x1 ; x2 )
@x1
@f (x1 ; x2 )
@x2
Now we put
¤ 0
2
2
2
2
= ¡x1 £ e¡ 2 (x1 +x2 ) ;
1
= ¡x2 £ e¡ 2 (x1 +x2 ) :
1
s on the xi 0 s and solve for x¤1 and x¤2 as:
@f (x¤1 ; x¤2 )
@x1
@f (x¤1 ; x¤2 )
@x2
¤2
1 ¡ 12 (x¤2
1 +x2 ) =) x¤ = 0
e
1
2¼
¤2
1 ¡ 12 (x¤2
1 +x2 ) =) x¤ = 0:
e
= 0 = ¡x¤2 £
2
2¼
= 0 = ¡x¤1 £
We would like to show that x¤1 = 0; x¤2 = 0 is a global maximum.
The Hessian of f (x1 ; x2 ) is:
¸
· 2
x1 ¡ 1 x1 x2
¡ 12 (x21 +x21 )
H (x1 ; x2 ) = e
x1 x2 x22 ¡ 1
and for x¤1 = 0; x¤2 = 0 we have:
H (0; 0) =
·
¡1
0
0 ¡1
¸
which is negative de…nite and so we conclude that x¤1 = 0; x¤2 = 0 is a local
maximum.
We cannot use concavity to show that x¤1 = 0; x¤2 = 0 is a global maximum
since at x1 = 2; x2 = 2 we have:
·
¸
3 4
H (2; 2) = e¡4
4 3
which is not negative de…nite (since it has positive elements on the diagonal).
It follows that f (x1 ; x2 ) is not concave.
We can show that f (x1 ; x2 ) is quasi-concave. Note that:
2
2
f (x1 ; x2 ) = e¡ 2 (x1 +x2 ) = g (h (x1 ; x2 ))
1
CHAPTER 4. MULTIVARIATE CALCULUS
231
with monotonic function: g (x) = ex and concave function:
h (x1 ; x2 ) = ¡
since its Hessian is given by:
·
¢
1¡ 2
x1 + x22 :
2
¡1
0
0 ¡1
¸
which is a diagonal matrix with negative elements along the diagonal and hence
is negative de…nite for all x1 ; x2 : It follows then that f (x1 ; x2 ) is quasi-concave
and hence that x¤1 = 0; x¤2 = 0 is a global maximum for both f (x1 ; x2 ) and
h (x1 ; x2 ) :
4.5.3
Indi¤erence Curves and Quasi-Concavity
Suppose a household has a utility function:
U (Q1 ; Q2 )
where the marginal utility of good 1 and 2 are:
@U (Q1 ; Q2 )
@Q1
@U (Q1 ; Q2 )
@Q2
´ U1 > 0;
´ U2 > 0
and the second derivatives are:
U11 =
@ 2 U (Q1 ; Q2 )
@ 2 U (Q1 ; Q2 )
@ 2 U (Q1 ; Q2 )
; U22 =
; U12 =
:
2
2
@Q1
@Q2
@Q1 @Q2
We de…ne an indi¤erence curve from the utility function as follows:
De…nition 275 Indi¤erence Curve: An indi¤erence curve corresponding to
utility level c, written as Q2 = f (Q1 ) ; is all combinations of Q1 ; Q2 which yield
c units of utility or:
U (Q1 ; f (Q1 )) = c:
We de…ne the slope of the indi¤erence curve as:
De…nition 276 Marginal Rate of Substitution: The marginal rate of substitution is:
M RS (Q1 ; Q2 ) = f 0 (Q1 ) :
De…nition 277 We say that the indi¤erence curve has a diminishing marginal
rate if substitution if f 00 (Q1 ) > 0:
CHAPTER 4. MULTIVARIATE CALCULUS
232
Example: Given the Cobb-Douglas utility function:
U (Q1 ; Q2 ) = Q21 Q42
then an indi¤erence curve which yields 5 units of utility is de…ned as:
1
¡ 12
U (Q1 ; Q2 ) = Q21 Q42 = 5 =) Q2 = 5 4 Q1
1
¡1
so that the indi¤erence curve is: Q2 = 5 4 Q1 2 : This indi¤erence curve is plotted
below:
Q2
Q1
:
Indi¤erence Curve
Note that this indi¤erence curve has the correct shape: it is downward sloping
and convex or bent towards origin. It therefore exhibits a diminishing marginal
rate of substitution.
You can verify that this is true for all indi¤erence curves where the indi¤erence curve corresponding to c units of utility is:
1
¡1
Q2 = f (Q1 ) = c 4 Q1 2 :
We can show that all indi¤erence curves slope downwards and show that the
slope or marginal rate of substitution is equal to the negative of the ratio of the
marginal utilities:
Theorem 278 Given a utility function U (Q1 ; Q2 ) the marginal rate of substitution is:
@U(Q1 ;Q2 )
@Q1
1 ;Q2 )
@Q2
f 0 (Q1 ) = ¡ @U(Q
=¡
Proof. An indi¤erence curve is de…ned as:
U (Q1 ; f (Q1 )) = c:
U1
< 0:
U2
CHAPTER 4. MULTIVARIATE CALCULUS
233
Let U (Q1 ; Q2 ) be the outside function and let g1 (Q1 ) = Q1 and g2 (Q1 ) =
f (Q1 ) be the two inside functions. Di¤erentiating both sides of with respect to
Q1 and using the chain rule we …nd that:
@U (g1 (Q1 ) ; g2 (Q1 )) 0
@U (g1 (Q1 ) ; g2 (Q1 )) 0
g1 (Q1 ) +
g2 (Q1 ) = 0:
@Q1
@Q2
Since g10 (Q1 ) = 1 and g20 (Q1 ) = f 0 (Q1 ) we have:
@U (Q1 ; f (Q1 )) @U (Q1 ; f (Q1 )) 0
+
f (Q1 ) = 0
@Q1
@Q2
and since Q2 = f (Q1 ) we can write this as:
@U (Q1 ; Q2 ) @U (Q1 ; Q2 ) 0
+
f (Q1 ) = 0
@Q1
@Q2
so that solving for f 0 (Q1 ) and using U1 > 0 and U2 > 0 the result follows.
Now suppose we take a monotonic transformation of U (Q1 ; Q2 ) as:
~ (Q1 ; Q2 ) = g (U (Q1 ; Q2 ))
U
~ (Q1 ; Q2 )?
where g0 (x) > 0. We might then ask what kind of utility function is U
~ (Q1 ; Q2 )
The quite surprising answer is that all practical purposes U (Q1 ; Q2 ) and U
are the same utility function! Actual economic behavior depends on the indif~ (Q1 ; Q2 ) are
ference curves and the indi¤erence curves of U (Q1 ; Q2 ) and U
identical; that is:
~ (Q1 ; Q2 ) = g (c) :
U (Q1 ; Q2 ) = c , U
or all combinations Q1 ; Q2 which yield c units of utility given U (Q1 ; Q2 ) also
~ (Q1 ; Q2 ) : The only di¤erence is that with one
yield g (c) units of utility given U
utility function the indi¤erence curve has the utility number c associated with it
while the other has the utility number g (c) associated with it. Actual economic
behavior though does not depend on what utility numbers we attach to each
~ (Q1 ; Q2 ) both represent the same
indi¤erence curve and so U (Q1 ; Q2 ) and U
preferences and hence economic behavior.
We thus have:
Theorem 279 An indi¤erence curve of a utility function U (Q1 ; Q2 ) is an ordinal property of U (Q1 ; Q2 ) :
Example: Given the Cobb-Douglas utility function:
U (Q1 ; Q2 ) = Q21 Q42
if we transform it with g (x) = ln (x) (with g0 (x) = x1 > 0) then
¢
¡
~ (Q1 ; Q2 ) = ln Q21 Q42 = 2 ln (Q1 ) + 4 ln (Q2 )
U
CHAPTER 4. MULTIVARIATE CALCULUS
234
is an equivalent utility function and has the same indi¤erence curves. Thus if
~ (Q1 ; Q2 ) which yields ln (5) units of
we calculate the indi¤erence curve for U
utility we obtain:
~ (Q1 ; Q2 )
U
=
2 ln (Q1 ) + 4 ln (Q2 ) = ln (5)
1
¡1
=) Q21 Q42 = 5 =) Q2 = 5 4 Q1 2 :
which is the identical indi¤erence curve for 5 units of utility for U (Q1 ; Q2 ) =
Q21 Q42 that we calculated above.
In fact all of the utility functions below lead to the same indi¤erence curves:
1
2
1
5
2
5
Q21 Q42 ; Q13 Q23 ; eQ1 Q2 ;
1
2
ln (Q1 ) + ln (Q2 ) ; 2 ln (Q1 ) + 4 ln (Q2 ) :
3
3
The question now is under what circumstances do indi¤erence curves exhibit
a diminishing marginal rate of substitution? A good …rst guess would be the
concavity of the utility function. Although this is su¢cient it cannot be necessary. For example we have seen that U (Q1 ; Q2 ) = Q21 Q42 exhibits a diminishing
marginal rate of substitution even though U (Q1 ; Q2 ) = Q21 Q42 is not concave
since concavity requires that the diagonal elements of the Hessian be negative
while:
U11 = 2Q22 > 0; U22 = 12Q21 Q22 > 0:
A necessary and su¢cient condition is in fact quasi-concavity. We have:
Theorem 280 The utility function U (Q1 ; Q2 ) exhibits a diminishing marginal
rate of substitution if and only if it is quasi-concave.
Example: Despite not being concave, the utility function U (Q1 ; Q2 ) = Q21 Q42
is quasi-concave since:
¢
¡
~ (Q1 ; Q2 ) = ln Q2 Q4 = 2 ln (Q1 ) + 4 ln (Q4 )
U
1 2
~ (Q1 ; Q2 ) is globally concave.
and U
We can obtain the following calculus test for the quasi-concavity of the utility
function:
Theorem 281 A utility function U (Q1 ; Q2 ) with U1 > 0 and U2 > 0 is quasiconcave if and only if det [H] > 0 where
2
3
0
U1 U2
det [H] = det 4 U1 U11 U12 5
U2 U12 U22
2
= ¡U22 U11 + 2U12 U1 U2 ¡ U12 U22
:
CHAPTER 4. MULTIVARIATE CALCULUS
235
Proof. Using the multivariate chain rule on
@U (Q1 ; f (Q1 )) @U (Q1 ; f (Q1 )) 0
+
f (Q1 ) = 0
@Q1
@Q2
we …nd that:
U11 + 2U12 f 0 (Q1 ) + U22 f 0 (Q1 )2 + U2 f 00 (Q1 ) = 0:
1
Substituting f 0 (Q1 ) = ¡ U
U2 we obtain:
U1
2
U11 ¡ 2U12
+ U22
U2
µ
U1
U2
¶2
+ U2 f 00 (Q1 ) = 0
from which it follows that:
Ã
µ ¶2 !
1
U1
U
1
2
f 00 (Q1 ) = ¡
+ U22
U11 ¡ 2U12
U2
U2
U2
¢
1 ¡
2
¡U22 U11 + 2U12 U1 U2 ¡ U12 U22
=
3
U2
1
det [H] :
=
U23
Since U2 > 0 it follows that f 00 (Q1 ) > 0 if and only if det [H] > 0:
Remark 1: This matrix is sometimes referred to as the bordered Hessian. It
contains the ordinary Hessian of U (Q1 ; Q2 ) in the lower right-hand corner and
is bordered by the …rst derivatives on either side with a 0 in the upper left-hand
corner.
Example: For
U (Q1 ; Q2 ) = Q21 Q42
the bordered Hessian is:
2
0
H = 4 2Q1 Q42
4Q21 Q32
2Q1 Q42
2Q42
8Q1 Q32
and (with some work) you can show that:
3
4Q21 Q32
8Q1 Q32 5
12Q21 Q22
det [H] = 48Q41 Q10
2 >0
so that, as we already knew, U (Q1 ; Q2 ) is quasi-concave.
CHAPTER 4. MULTIVARIATE CALCULUS
4.6
236
Constrained Optimization
Economics whether normative or positive, has not simply been the
study of the allocation of scarce resources, it has been the study of
the rational allocation of scarce resources. -Herbert A. Simon
Typically in economics when rational agents attempt to maximize pro…ts or
utility, or to minimize costs or expenditure, they are not free to choose any one
of the variables they control. Instead they face some constraint that restricts
the choices they can make. This is because economics is about scarcity and
scarcity imposes constraints on economies and agents. For example a household
maximizing utility cannot choose any bundle of goods might want, but can only
choose from amongst those bundles that it can a¤ord; that is which satisfy the
household’s budget constraint.
This leads to a new kind of optimization problem from what we have considered so far where instead of working directly with the objective function
f (x1 ; x2 ; : : : xn ) we construct a new function, the Lagrangian:
L (¸; x1 ; x2 ; : : : xn )
and work with this function instead.
Economists work with Lagrangians all the time. In a way it is the most
important mathematical technique for you to learn if you want to go on in
economics.
4.6.1
The Lagrangian
Suppose we wish to maximize or minimize a multivariate function
f (x1 ; x2 ; : : : xn )
subject to a constraint. The constraint is written as:
g (x1 ; x2 ; : : : xn ) = 0:
This means that in maximizing or minimizing f (x1 ; x2 ; : : : xn ) we can only
choose those x1 ; x2 ; : : : xn which make g (x1 ; x2 ; : : : xn ) equal to zero.
To do this we construct the Lagrangian, which is a function of n+1 variables:
the Lagrange multiplier ¸; which is a scalar, and the n components x1 ; x2 ; : : : xn :
We have:
De…nition 282 Corresponding to the problem of maximizing or minimizing the
objective function:
f (x1 ; x2 ; : : : xn )
CHAPTER 4. MULTIVARIATE CALCULUS
237
subject to the constraint
g (x1 ; x2 ; : : : xn ) = 0
is the Lagrangian given by:
L (¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸g (x1 ; x2 ; : : : xn ) :
At the beginning students often make mistakes in constructing the Lagrangian. To avoid these errors consider the following step-by-step recipe:
A Recipe for Constructing the Lagrangian
1. Identify the objective function, the function to be maximized or minimized:
f (x1 ; x2 ; : : : xn ) :
2. Identify the constraint and, if necessary, rewrite the constraint in the
form of
______ = 0:
3. Write the Lagrangian function using L with the …rst argument the Lagrange multiplier ¸ followed by the xi 0 s . We thus write:
L (¸; x1 ; x2 ; : : : xn ) =
4. After the equality sign in 3: write the objective function: f (x1 ; x2 ; : : : xn )
followed by +¸ and then brackets as:
L (¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸ (
):
5. Inside the brackets in 4: put the expression to on left-hand side of the
constraint written as ___ = 0 as:
1
0
C
B
(¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸ @ _________ A
{z
}
|
left-side of _=0 from 2:
which then gives the Lagrangian.
Example 1: Consider the problem of minimizing
1
x21 + x22
2
subject to the constraint that x1 and x2 sum to 1 or that:
x1 + x2 = 1:
CHAPTER 4. MULTIVARIATE CALCULUS
238
1. We …rst identify what is the constraint and what is to be maximized.
Here a typical error would be to confuse: x1 + x2 ; which forms part of
the constraint, with: x21 + 12 x22 which is the objective function. The
objective function is what we wish to minimize:
1
f (x1 ; x2 ) = x21 + x22 :
2
2. The constraint is that x1 + x2 = 1: We need to rewrite this as: ___ =
0: This is easily done by putting x1 + x2 on the other side of the equal
sign as:
x1 + x2 = 1 =) 1 ¡ x1 ¡ x2 = 0
so that g (x1 ; x2 ) is given by:
g (x1 ; x2 ) = 1 ¡ x1 ¡ x2 = 0:
3. Here the Lagrangian is a function of ¸ and x1 and x2 so we write:
L (¸; x1 ; x2 ) = :
4. After the equal sign in 3: we write the objective function from 1 followed
by +¸ ( ) as:
L (¸; x1 ; x2 ) =
1
x21 + x22
| {z2 }
+ ¸ (_____) :
ob jective function
5. Inside the brackets we place the left-hand side of the constraint written as
___ = 0: Thus from 2: we have:
1
0
1
L (¸; x1 ; x2 ) = x21 + x22 + ¸ @1 ¡ x1 ¡ x2 A :
|
{z
}
2
from 2
Example 2: Suppose a household wishes to maximize utility:
U (Q1 ; Q2 ) :
where Q1 and Q2 are the amounts of good 1 and good 2 that the household
consumes. The household has income Y , the price of Q1 is P1 and the price of
Q2 is P2 so that the budget constraint is:
Y = P1 Q1 + P2 Q2 :
CHAPTER 4. MULTIVARIATE CALCULUS
239
We need to rewrite this as g (Q1 ; Q2 ) = 0: This can be done in a number of
ways. Here we will use:
Y = P1 Q1 + P2 Q2 =) Y ¡ P1 Q1 ¡ P2 Q2 = 0
so that the constraint is:
g (Q1 ; Q2 ) = Y ¡ P1 Q1 ¡ P2 Q2 = 0:
The Lagrangian is therefore:
L (¸; Q1 ; Q2 ) =
U (Q1 ; Q2 )
| {z }
ob jective function
1
0
+ ¸ @Y ¡ P1 Q1 ¡ P2 Q2 A :
|
{z
}
constraint
Example 3: Suppose now that we have the particular utility function:
U (Q1 ; Q2 ) = 0:3 ln (Q1 ) + 0:7 ln (Q2 )
and as above the budget constraint is:
Y = P1 Q1 + P2 Q2
or:
g (Q1 ; Q2 ) = Y ¡ P1 Q1 ¡ P2 Q2 = 0:
The Lagrangian is therefore:
1
0
C
B
L (¸; Q1 ; Q2 ) = 0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ @Y ¡ P1 Q1 ¡ P2 Q2 A :
|
{z
}
{z
}
|
=g(Q1 ;Q2 )
=U(Q1 ;Q2 )
Example 4: Suppose a …rm has a Cobb-Douglas production function:
1
3
Q = F (L; K) = L 2 K 4 :
The …rm’s objective is to minimize the cost of producing Q units. Thus the
objective function is costs:
W L + RK
where W is the wage and R the rental cost of capital.
The constraint is that L and K must produce Q units of output (otherwise
L = K = 0 minimizes costs!) so that Q = F (L; K) is the constraint. Using:
1
3
Q = F (L; K) =) Q ¡ L 2 K 4 = 0
CHAPTER 4. MULTIVARIATE CALCULUS
240
we can rewrite the constraint as g (L; K) = 0 where:
1
3
g (L; K) = Q ¡ L 2 K 4 = 0:
We therefore have the Lagrangian for cost minimization as:
1
0
1
3
L (¸; L; K) = W
+ RK} + ¸ @Q ¡ L 2 K 4 A :
| L {z
|
{z
}
ob jective
constraint
Example 5: Now consider the more general problem of a …rm with a production
function
Q = F (L; K)
wishes to minimize cost of producing Q units. The Lagrangian is then:
1
0
L (¸; L; K) = W
+ RK} + ¸ @Q ¡ F (L; K)A :
| L {z
{z
}
|
ob jective
4.6.2
constraint
First-Order Conditions
For constrained optimization we have, just as before, …rst-order conditions. Now
however the relevant …rst-order conditions are not with respect to f (x1 ; x2 ; : : : xn )
but with respect to the Lagrangian:
L (¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸g (x1 ; x2 ; : : : xn ) :
We have:
Theorem 283 Suppose x¤1 ; x¤2 ; : : : x¤n either maximizes or minimizes the objective function: f (x1 ; x2 ; : : : xn ) subject to the constraint g (x1 ; x2 ; : : : xn ) = 0:
Then there is a ¸¤ such that:
@L (¸¤ ; x¤1 ; x¤2 ; : : : x¤n )
= g (x¤1 ; x¤2 ; : : : x¤n ) = 0
@¸
@L (¸¤ ; x¤1 ; x¤2 ; : : : x¤n )
@f (x¤1 ; x¤2 ; : : : x¤n )
@g (x¤1 ; x¤2 ; : : : x¤n )
=
+ ¸¤
= 0;
@xi
@xi
@xi
i = 1; 2; : : : n:
Remark 1: There are n+1 …rst-order conditions leading to n+1 equations in n+
1 unknowns: ¸¤ ; x¤1 ; x¤2 ; : : : x¤n : Since there are as many equations as unknowns,
it should be possible to solve them for ¸¤ ; x¤1 ; x¤2 ; : : : x¤n :
Remark 2: It is important that the Lagrange multiplier ¸ also have a ¤ : To
solve the …rst-order conditions you must solve for ¸¤ : Thus in essence then there
is no di¤erence between the treatment of the xi 0 s and ¸:
CHAPTER 4. MULTIVARIATE CALCULUS
241
Remark 3: The …rst of the …rst-order conditions:
@L (¸¤ ; x¤1 ; x¤2 ; : : : x¤n )
= g (x¤1 ; x¤2 ; : : : x¤n ) = 0
@¸
insures that x¤1 ; x¤2 ; : : : x¤n satis…es the constraint:
Example 1 (continued): From the Lagrangian:
1
L (¸; x1 ; x2 ) = x21 + x22 + ¸ (1 ¡ x1 ¡ x2 )
2
@L
we need to calculate three partial derivatives: @L
@¸ ; @x1 and
@L
@¸
=
=
@L
@x1
=
=
@L
@x1
=
=
Putting a
¤
@L
@x2 :
We thus have:
¶
µ
@
1
x21 + x22 + ¸ (1 ¡ x1 ¡ x2 )
@¸
2
1 ¡ x1 ¡ x2
¶
µ
@
1
x21 + x22 + ¸ (1 ¡ x1 ¡ x2 )
@x1
2
2x1 ¡ ¸
¶
µ
@
1
x21 + x22 + ¸ (1 ¡ x1 ¡ x2 )
@x1
2
x2 ¡ ¸:
on ¸; x1 x2 and setting the derivatives equal to zero we obtain:
1 ¡ x¤1 ¡ x¤2
2x¤1 ¡ ¸¤
x¤2 ¡ ¸¤
= 0 =) x¤1 + x¤2 = 1
1
= 0 =) x¤1 = ¸¤
2
= 0 =) x¤2 = ¸¤ :
Using the …rst result and adding up the second and third results we have:
1
1
3
x¤1 + x¤2 = ¸¤ + ¸¤ = ¸¤
2
2
3 ¤
=) 1 = ¸
2
2
¤
=) ¸ =
3
=
Now that we have ¸¤ we can solve for x¤1 and x¤2 as:
x¤1
x¤2
1
1 ¤ 1 2
¸ = £ =
2
2 3
3
2
¤
= ¸ = :
3
=
Thus the solution is ¸¤ = 23 ; x¤1 =
1
3
and x¤2 = 23 :
CHAPTER 4. MULTIVARIATE CALCULUS
242
Example 2 (continued): For the utility maximization problem we obtained
the Lagrangian:
L (¸; Q1 ; Q2 ) = U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ) :
@L
We need to calculate three partial derivatives: @L
@¸ ; @Q1 and
@L
@¸
=
=
@L
@Q1
=
=
@L
@Q2
=
=
@L
@Q2
as:
@
(U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ))
@¸
Y ¡ P1 Q1 ¡ P2 Q2
@
(U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ))
@Q1
@U (Q1 ; Q2 )
¡ ¸P1
@Q1
@
(U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ))
@x1
@U (Q1 ; Q2 )
¡ ¸P2 :
@Q2
Setting ¸ = ¸¤ ; Q1 = Q¤1 ; Q2 = Q¤2 and setting the partial derivatives equal to
zero we obtain three …rst-order conditions:
Y ¡ P1 Q¤1 ¡ P2 Q¤2
@U (Q¤1 ; Q¤2 )
¡ ¸¤ P1
@Q1
@U (Q¤1 ; Q¤2 )
¡ ¸¤ P2
@Q2
= 0
= 0
= 0:
Note that the …rst condition insures that:
P1 Q¤1 + P2 Q¤2 = Y
so that Q¤1 and Q¤2 satisfy the budget constraint.
Since we have made no assumptions about U (Q1 ; Q2 ) we cannot hope to
solve these three equations directly for ¸¤ ; Q¤1 ; Q¤2 : We can however use these
equations to learn something about the nature of the optimal decision rule for
the household. From the second and third of the …rst-order conditions we have:
@U (Q¤1 ; Q¤2 )
¡ ¸¤ P1
@Q1
=
=)
0 =) M U1 (Q¤1 ; Q¤2 ) = ¸¤ P1
MU1 (Q¤1 ; Q¤2 )
= ¸¤
P1
and
@U (Q¤1 ; Q¤2 )
¡ ¸¤ P2
@Q2
=
=)
0 =) M U2 (Q¤1 ; Q¤2 ) = ¸¤ P2
MU2 (Q¤1 ; Q¤2 )
= ¸¤ :
P2
CHAPTER 4. MULTIVARIATE CALCULUS
243
From these two results we conclude that:
M U2 (Q¤1 ; Q¤2 )
M U1 (Q¤1 ; Q¤2 )
=
= ¸¤ :
P1
P2
This says that a rational household will allocate its income Y between Q1 and
Q2 so as to equate the ratio of each good’s marginal utility to its price. This
is the familiar condition from introductory economics. In introductory however
MU2
1
one does not answer the question, what are MU
P1 and P2 are equal to? The
¤
answer is ¸ ; the Lagrange multiplier. Later you will learn that ¸¤ is in fact the
marginal utility of income.
Example 3 (continued): For the Lagrangian:
L (¸; Q1 ; Q2 ) = 0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 )
we need to calculate three partial derivatives:
@L
@¸
=
=
@L
@Q1
=
=
@L
@Q2
=
=
@L @L
@¸ ; @Q1
and
@L
@Q2
as:
@
(0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ))
@¸
Y ¡ P1 Q1 ¡ P2 Q2
@
(0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ))
@Q1
0:3
¡ ¸P1
Q1
@
(0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ))
@Q2
0:7
¡ ¸P2 :
Q2
Setting ¸ = ¸¤ ; Q1 = Q¤1 ; Q2 = Q¤2 and setting the partial derivatives equal to
zero we obtain three …rst-order conditions:three …rst-order conditions:
Y ¡ P1 Q¤1 ¡ P2 Q¤2
0:3
¡ ¸¤ P1
Q¤1
0:7
¡ ¸¤ P2
Q¤2
= 0
= 0
= 0:
We have three equations with three unknowns. To solve them take the
second and third equations to obtain:
0:3
¡ ¸¤ P1
Q¤1
0:7
¡ ¸¤ P2
Q¤2
0:3
¸¤ P1
0:7
= 0 =) Q¤2 = ¤ :
¸ P2
= 0 =) Q¤1 =
CHAPTER 4. MULTIVARIATE CALCULUS
244
We are not yet at the solution because we still do not know ¸¤ : Substituting
these two into the budget constraint we obtain:
µ
¶
µ
¶
0:3
0:7
Y = P1 Q¤1 + P2 Q¤2 = P1
+
P
2
¸¤ P1
¸¤ P2
0:3 0:7
=
+ ¤
¸¤
¸
1
=
:
¸¤
From this it follows that: ¸¤ = Y1 : This says that the marginal utility of income
decreases with income, or richer people get less utility out of an extra dollar
than poorer people. We then have:
Q¤1
=
Q¤2
=
0:3
0:3
0:3Y
= 1
=
and
¸¤ P1
P1
P
Y 1
0:7
0:7
0:7Y
= 1
=
¸¤ P2
P2
P
Y 2
¤
so that Q¤1 = 0:3Y
P1 is the demand curve for good 1 and Q2 =
curve for good 2:
0:7Y
P2
is the demand
Example 4 (continued): From the Lagrangian:
´
³
1
3
L (¸; L; K) = W L + RK + ¸ Q ¡ L 2 K 4
we obtain the …rst-order conditions:
@L (¸¤ ; L¤ ; K ¤ )
@¸
@L (¸¤ ; L¤ ; K ¤ )
@L
@L (¸¤ ; L¤ ; K ¤ )
@K
1
3
= Q ¡ L¤ 2 K ¤ 4 = 0
1
3
1
= W ¡ ¸¤ L¤¡ 2 K ¤ 4 = 0
2
3
1
1
= R ¡ ¸¤ L¤ 2 K ¤¡ 4 = 0:
4
Using the ln ( ) function we can convert these into a system of 3 linear equations
in 3 unknowns as:
1
3
1
3
Q ¡ L¤ 2 K ¤ 4 = 0 =) ln (L¤ ) + ln (K ¤ ) = ln (Q)
2
4
1 ¤ ¤¡ 1 ¤ 3
1
3
¤
W ¡ ¸ L 2 K 4 = 0 =) ln (¸ ) ¡ ln (L¤ ) + ln (K ¤ ) = ln (2W )
2
2
4
µ ¶
1
1
4
3 ¤ ¤ 1 ¤¡ 1
¤
¤
R
R ¡ ¸ L 2 K 4 = 0 =) ln (¸ ) + ln (L ) ¡ ln (K ¤ ) = ln
4
2
4
3
which can be written in matrix notation as:
2
3 2
32
3
3
1
ln (Q)
0
ln (¸¤ )
2
4
3 54
4 1 ¡1
5
ln (L¤ ) 5 = 4 ln (2W
2
4
¡ 4 ¢) :
1
1
¤
ln 3 R
ln (K )
¡4
1
2
CHAPTER 4. MULTIVARIATE CALCULUS
245
You can verify that
2
1
0
2
det 4 1 ¡ 12
1
1
2
3
4
3
4
¡ 14
3
5 = 5:
4
Using Cramer’s rule we then …nd that:
2
3
3
1
ln (Q)
2
4
3 5
1
det 4 ln (2W
4
¡ 4 ¢) ¡ 21
µ ¶
1
¡4
ln 3 R
2
3
4
1
¤
2
ln (¸ ) =
R
= ¡ ln (Q) + ln (2W ) + ln
5
5
5
5
3
4
2
3
3
0
ln (Q)
4
1 5
det 4 1 ln (2W
4
¡ 4 ¢)
µ ¶
1 ln 3 R ¡ 34
3
3
4
4
¤
ln (L ) =
R
= ln (Q) ¡ ln (2W ) + ln
5
5
5
5
3
4
2
3
1
0
ln (Q)
2
5
det 4 1 ¡ 12 ln (2W
¡ 4 ¢)
µ ¶
1
ln
R
1
2
2
4
4
2
3
ln (K ¤ ) =
ln
(Q)
+
ln
(2W
)
¡
ln
R
=
5
5
5
5
3
4
from which it follows that:
¤
¸
L¤
K¤
¶ 35
4
R
= Q (2W )
3
µ ¶ 35
3
4
4
R
= Q 5 (2W )¡ 5
3
µ ¶¡ 25
2
4
4
R
= Q 5 (2W ) 5
:
3
¡ 15
2
5
µ
From these we can work out the …rm’s cost function as:
C ¤ (Q; W; R) = W L¤ + RK ¤
õ ¶ 3 µ ¶ 2 !
4
2
3
2 5
3 5
=
+
Q5 W 5 R5 :
3
2
Let us now note some patterns that are generally true. Note that the Lagrange multiplier ¸¤ turns out to be marginal cost; that is:
2
@C ¤ (Q; W; R)
1
= ¸¤ = Q¡ 5 (2W ) 5
@Q
µ
4
R
3
¶ 35
:
The fact that marginal cost falls with Q re‡ects the increasing returns to scale of
this technology. L¤ and K ¤ are the conditional factor demands for L and K;
CHAPTER 4. MULTIVARIATE CALCULUS
246
that is conditional on the …rm producing an output level Q this is the optimal
(cost minimizing) amount of labour and capital that they would demand. Note
L¤ and K ¤ here are not the same as the ordinary demand and supply curves
for labour which are based on pro…t maximization and which have arguments
P; W and R and not Q; W and R as here. It is also the case that:
@C ¤ (Q; W; R)
@W
@C ¤ (Q; W; R)
@R
¶ 35
4
= L = Q (2W )
R
3
µ ¶¡ 25
2
4
4
= K ¤ = Q 5 (2W ) 5
R
:
3
¤
4
5
¡ 35
µ
These two results are examples of Shephard’s lemma.
Example 5 (continued): Given the Lagrangian from the cost minimization
problem:
L (¸; L; K) = W L + RK + ¸ (Q ¡ F (L; K))
we have the …rst-order conditions for cost minimization:
@L (¸¤ ; L¤ ; K ¤ )
@¸
@L (¸¤ ; L¤ ; K ¤ )
@L
@L (¸¤ ; L¤ ; K ¤ )
@K
= Q ¡ F (L¤ ; K ¤ ) = 0
@F (L¤ ; K ¤ )
=0
@L
@F (L¤ ; K ¤ )
= R ¡ ¸¤
=0
@K
= W ¡ ¸¤
The …rst condition insures that: Q = F (L¤ ; K ¤ ) so that L¤ and K ¤ produce Q
units of output. From the second and third conditions, and recalling that
@F (L¤ ; K ¤ )
@F (L¤ ; K ¤ )
= M PL (L¤ ; K ¤ ) ;
= M PK (L¤ ; K ¤ )
@L
@K
it follows from the second and third …rst-order conditions that:
M PK (L¤ ; K ¤ )
M PL (L¤ ; K ¤ )
1
=
:
¤ =
¸
W
R
4.6.3
Second-Order Conditions
As with unconstrained optimization any solution to the …rst-order conditions
can be either a maximum or a minimum. We can determine if ¸¤ ; x¤1 ; x¤2 ; : : : x¤n
is a local maximum or minimum by examining the Hessian of the Lagrangian
CHAPTER 4. MULTIVARIATE CALCULUS
247
given by:
2
6
6
6
6
H (¸; x1 ; x2 ; : : : xn ) = 6
6
6
4
2
6
6
6
6
= 6
6
6
6
4
@2 L
@¸2
@2 L
@¸@x1
@2 L
@¸@x2
@2 L
@¸@x1
@2 L
@x21
@2 L
@x1 @x2
..
.
..
.
2
2
@ L
@¸@xn
@ 2 f (x)
@x21
@ 2 f (x)
@x1 @x2
..
.
@g(x)
@xn
..
.
2
@ L
@x1 @xn
0
@g(x)
@x1
@g(x)
@x2
@2L
@¸@x2
@2L
@x1 @x2
@2L
@x22
@ 2 f (x)
@x1 @xn
@ L
@x2 @xn
@g(x)
@x1
2
g(x)
+ ¸ @ @x
2
2
1
@ g(x)
+ ¸ @x
1 @x2
..
.
@ 2 g(x)
+ ¸ @x
1 @xn
¢¢¢
¢¢¢
¢¢¢
..
.
¢¢¢
@2L
@¸@xn
@2L
@x1 @xn
@2L
@x2 @xn
..
.
2
@ L
@x2n
@ 2 f (x)
@x1 @x2
@ 2 f (x)
@x22
@ 2 f (x)
@x2 @xn
@g(x)
@x2
3
7
7
7
7
7
7
7
5
2
@ g(x)
+ ¸ @x
1 @x2
2
g(x)
+ ¸ @ @x
2
2
..
.
@ 2 g(x)
+ ¸ @x
2 @xn
¢¢¢
¢¢¢
¢¢¢
..
.
¢¢¢
Remark 1: Note the zero in the upper left-hand corner of H (¸; x1 ; x2 ; : : : xn ) :
This occurs because the Lagrangian is a linear function of ¸ so that:
L (¸; x1 ; x2 ; : : : xn )
=
f (x1 ; x2 ; : : : xn ) + ¸g (x1 ; x2 ; : : : xn )
@L (¸; x1 ; x2 ; : : : xn )
= g (x1 ; x2 ; : : : xn )
=)
@¸
@ 2 L (¸; x1 ; x2 ; : : : xn )
@
g (x1 ; x2 ; : : : xn ) = 0:
=)
=
2
@¸
@¸
Remark 2: Note that the partial derivatives @g(x)
@xi i = 1; 2; : : : n of the constraint function along the border of the Hessian. For this reason H (¸; x) is
sometimes referred to as the bordered Hessian.
Remark 3: Since neither a positive de…nite nor a negative de…nite matrix can
have a 0 along the diagonal, it follows that L (¸; x1 ; x2 ; : : : xn ) is neither
concave nor convex. It follows that the second-order conditions cannot be
the same as with unconstrained optimization. Another way of seeing this point
is that for the …rst leading principal minor
M1 = 0
always. Thus M1 tells us nothing about whether we have a maximum or a
minimum. Since the …rst diagonal element of the Hessian is 0, it follows that
L (¸; x1 ; x2 ; : : : xn ) is neither concave nor convex and so the second-order
conditions cannot be the same as with unconstrained optimization.
This point is reinforced by the fact that the second leading principal minor
is
µ
¶2
@g (x1 ; x2 ; : : : xn )
<0
M2 = ¡
@x1
@ 2 f (x)
@x1 @xn
@ 2 f (x)
@x2 @xn
@ 2 f(x)
@x2n
3
@g(x)
@xn
@ 2 g(x) 7
7
+ ¸ @x
1 @xn 7
@ 2 g(x) 7
+ ¸ @x2 @xn 7 :
7
7
..
7
.
5
2
g(x)
+ ¸ @ @x
2
n
CHAPTER 4. MULTIVARIATE CALCULUS
248
and so this also tells us nothing about whether ¸¤ ; x¤1 ; x¤2 ; : : : x¤n corresponds to
a maximum or a minimum.
It is only at the third leading principal minor M3 where the Hessian begins
to tell us something about we have a maximum or a minimum. In particular let
M3 ; M4 ; M5 ; : : :
be the leading principal minors of H (¸¤ ; x¤1 ; x¤2 ; : : : x¤n ) :
We then have:
Theorem 284 Suppose that ¸¤ ; x¤1 ; x¤2 ; : : : x¤n satisfy the …rst-order conditions
from the Lagrangian and that the leading principal minors of the Hessian H (¸¤ ; x¤1 ; x¤2 ; : : : x¤n )
satisfy:
M3 > 0; M4 < 0 ; M5 > 0 ¢ ¢ ¢
then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n corresponds to a constrained local maximum.
Theorem 285 Suppose that ¸¤ ; x¤1 ; x¤2 ; : : : x¤n satisfy the …rst-order conditions
from the Lagrangian and that the leading principal minors of the Hessian H (¸¤ ; x¤1 ; x¤2 ; : : : x¤n )
satisfy:
M3 < 0; M4 < 0 ; M5 < 0 ¢ ¢ ¢
then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n corresponds to a constrained local minimum.
Remark: Evaluating the leading principal minors of bordered Hessians is often
a tedious business. Most if not all of the examples that we will consider involve
the case where there are n = 2 independent variables or xi 0 s so that the Hessian
is a 3 £ 3 matrix. In this case one need only calculate the determinant of the
Hessian itself and check that it is positive for a maximum or negative for a
minimum. In particular we have:
Theorem 286 If n = 2 then the solution to the …rst-order conditions: ¸¤ ; x¤1 ; x¤2
represents a local constrained maximum if
M3 > 0
and a local constrained minimum if:
M3 < 0:
Example 1 (continued): From the Lagrangian:
1
L (¸; x1 ; x2 ) = x21 + x22 + ¸ (1 ¡ x1 ¡ x2 )
2
CHAPTER 4. MULTIVARIATE CALCULUS
249
the Hessian is calculated from the second derivatives of the Lagrangian as:
@ 2 L (¸; x1 ; x2 )
@¸2
2
@ L (¸; x1 ; x2 )
@¸@x1
2
@ L (¸; x1 ; x2 )
@¸@x2
@ 2 L (¸; x1 ; x2 )
@x21
@ 2 L (¸; x1 ; x2 )
@x1 @x2
@ 2 L (¸; x1 ; x2 )
@x22
@
(1 ¡ x1 ¡ x2 ) = 0
@¸
@
(1 ¡ x1 ¡ x2 ) = ¡1
@x1
@
(1 ¡ x1 ¡ x2 ) = ¡1
@x2
@
(2x1 ¡ ¸) = 2
@x1
@
(2x1 ¡ ¸) = 0
@x2
@
(x2 ¡ ¸) = 1
@x2
=
=
=
=
=
=
so that the Hessian is given by:
2
3
0 ¡1 ¡1
2
0 5:
H (¸¤ ; x¤1 ; x¤2 ) = 4 ¡1
¡1
0
1
Note that the Hessian for this problem does not depend on ¸; x1 and x2 .
The second-order conditions for a minimum are then satis…ed since:
2
3
0 ¡1 ¡1
2
0 5
M3 = det [H (¸; x1 ; x2 )] = det 4 ¡1
¡1
0
1
= ¡3 < 0:
Example 2 (continued): For the utility maximization problem with the Lagrangian:
L (¸; Q1 ; Q2 ) = U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 )
the Hessian is given by:
where:
U11 ´
2
0
H (¸ ; Q¤1 ; Q¤2 ) = 4 ¡P1
¡P2
¤
¡P1
U11
U12
3
¡P2
U12 5
U22
@ 2 U (Q¤1 ; Q¤2 )
@ 2 U (Q¤1 ; Q¤2 )
@ 2 U (Q¤1 ; Q¤2 )
; U12 ´
; U22 ´
:
2
@Q1
@Q1 @Q2
@Q22
CHAPTER 4. MULTIVARIATE CALCULUS
250
In order for ¸¤ ; Q¤1 ; Q¤2 to be a utility maximum (and not a minimum!) we
require:
M3
= det [H (¸¤ ; Q¤1 ; Q¤2 )]
= ¡P12 U22 + 2P1 P2 U12 ¡ P22 U11 > 0:
This condition requires that the household’s indi¤erence curve be convex at
¸¤ ; Q¤1 ; Q¤2 so that there is a local diminishing marginal rate of substitution.
Example 3 (continued): The Hessian of the Lagrangian:
L (¸; Q1 ; Q2 ) = 0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 )
at ¸¤ ; Q¤1 ; Q¤2 is given by:
2
0
6 ¡P1
¤
¤
¤
H (¸ ; Q1 ; Q2 ) = 4
¡P2
¡P1
¡ 0:3
2
(Q¤1 )
0
¡P2
0
¡ 0:7
2
(Q¤2 )
3
7
5:
Since the Hessian is a 3 £ 3 matrix, we only have to calculate the determinant of
H (¸¤ ; Q¤1 ; Q¤2 ) and verify that it is positive to show that ¸¤ ; Q¤1 ; Q¤2 correspond
to a local maximum. Thus:
M3 = det [H (¸¤ ; Q¤1 ; Q¤2 )] =
0:7P12
(Q¤2 )2
+
0:3P22
(Q¤1 )2
>0
and the second-order conditions for a local maximum are satis…ed.
Example 4 (continued): From the Lagrangian:
´
³
1
3
L (¸; L; K) = W L + RK + ¸ Q ¡ L 2 K 4
the Hessian is given by:
2
1
0
H (¸; L; K) = 4 ¡ 12 L¡ 2 K 4
1
1
¡ 34 L 2 K ¡ 4
1
3
3
¡ 12 L¡ 2 K 4
3
1
¡ 32
4
4 ¸L 1 K 1
3
¡2
¡4
¡ 8 ¸L K
3
1
1
¡ 34 L 2 K ¡ 4
1
1
¡ 38 ¸L¡ 2 K ¡ 4 5 :
1
5
3
¡4
2
16 ¸L K
With some straightforward work it can be shown that:
M3 = det [H (¸¤ ; L¤ ; K ¤ )] = ¡
15 ¤ ¤¡ 1 ¤ 1
¸ L 2K 4 < 0
32
(recall from the solution to the …rst-order conditions that ¸¤ > 0) and so
¸¤ ; L¤ ; K ¤ corresponds to a local minimum as required.
Example 5 (continued): Given the Lagrangian
L (¸; L; K) = W L + RK + ¸ (Q ¡ F (L; K))
CHAPTER 4. MULTIVARIATE CALCULUS
251
the Hessian is given by:
2
0
6
H (¸; L; K) = 4 ¡ @F (L;K)
@L
(L;K)
¡ @F@K
¡ @F (L;K)
@L
2
(L;K)
¡¸ @ F@L
2
2
F (L;K)
¡¸ @ @L@K
To make the notation more compact de…ne
FL
´
FLL
´
@F (L¤ ; K ¤ )
@F (L¤ ; K ¤ )
; FK ´
;
@L
@K
@ 2 F (L¤ ; K ¤ )
@ 2 F (L¤ ; K ¤ )
@ 2 F (L¤ ; K ¤ )
:
; FKK ´
and FLK ´
2
2
@L
@K
@L@K
The second-order conditions then require that:
2
0
¡FL
det [H (¸¤ ; L¤ ; K ¤ )] = det 4 ¡FL ¡¸¤ FLL
¡FK ¡¸¤ FLK
or
3
(L;K)
¡ @F@K
2
F (L;K) 7
¡¸ @ @L@K
5:
2
@ F (L;K)
¡¸ @K 2
3
¡FK
¡¸¤ FLK 5 < 0
¡¸¤ FKK
¡
¢
2
FLL < 0:
¸¤ FL2 FKK ¡ 2FL FK FLK + FK
As with utility maximization, this condition requires that the isoquant be bent
towards the origin.
4.6.4
Su¢cient Conditions for a Global Maximum or Minimum
The second order conditions we have examined only guarantee a local constrained maximum or minimum; they do not guarantee that ¸¤ ; x¤1 ; x¤2 ; : : : x¤n
will correspond to a global maximum or minimum. Like unconstrained optimization, we can insure that ¸¤ ; x¤1 ; x¤2 ; : : : x¤n is a global maximum or minimum
by appealing to quasi-concavity or quasi-convexity, but now we need to examine
the properties of both the objective function f (x1 ; x2 ; : : : xn ) and the constraint
function g (x1 ; x2 ; : : : xn ) ; as well as the sign of the Lagrange multiplier ¸¤ :
Almost all of the problems in economics one encounters at the intermediate level involve either a linear objective function f (x1 ; x2 ; : : : xn ) ; as in cost
minimization, or a linear constraint function g (x1 ; x2 ; : : : xn ) ; as in utility maximization. For these cases we can use the following results:
Theorem 287 If 1) f (x1 ; x2 ; : : : xn ) is quasi-concave, 2) the constraint is
linear, that is it can be written as:
g (x1 ; x2 ; : : : xn ) = a ¡ b1 x1 ¡ b2 x2 ¡ ¢ ¢ ¢ ¡ bn xn
and if 3) ¸¤ > 0; then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n corresponds to a constrained global
maximum.
CHAPTER 4. MULTIVARIATE CALCULUS
252
Theorem 288 If 1) f (x1 ; x2 ; : : : xn ) is quasi-convex, 2) the constraint is linear, that is it can be written as:
g (x1 ; x2 ; : : : xn ) = a ¡ b1 x1 ¡ b2 x2 ¡ ¢ ¢ ¢ ¡ bn xn
and if 3) ¸¤ > 0; then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n corresponds to a constrained global
minimum.
Remark: In addition to requiring that the constraint be linear, note that we
need to insure that the Lagrange multiplier ¸¤ > 0 or ¸¤ is positive. If you
…nd that ¸¤ < 0 this might be because of the way that you wrote down the
constraint. For example with utility maximization if you wrote the constraint
as:
g (Q1 ; Q2 ) = P1 Q1 + P2 Q2 ¡ Y = 0
you would obtain ¸¤ < 0 while if instead you used:
g (Q1 ; Q2 ) = Y ¡ P1 Q1 ¡ P2 Q2 = 0
you would obtain ¸¤ > 0: Thus if you …nd ¸¤ < 0 you may be able to …x this
problem by rewriting the constraint.
Example 1 (continued): Consider the problem of minimizing:
1
f (x1 ; x2 ) = x21 + x22
2
subject to the constraint:
g (x1 ; x2 ) = 1 ¡ x1 ¡ x2 = 0:
We …rst show that f (x1 ; x2 ) is convex and hence that it is quasi-convex. This
follows since its Hessian is given by:
·
¸
2 0
H (x1 ; x2 ) =
0 1
which is positive de…nite for all (x1 ; x2 ) : The second condition, that the constraint is linear is obviously satis…ed. Finally we showed that:
¸¤ =
2
>0
3
so the third condition is also satis…ed. It follows then that: x¤1 =
is the global minimum for all x1 and x2 which satisfy:
g (x1 ; x2 ) = 1 ¡ x1 ¡ x2 = 0
1
3
and x¤2 =
2
3
CHAPTER 4. MULTIVARIATE CALCULUS
253
or:
x1 + x2 = 1:
Example 2 (continued): Consider utility maximization where:
L (¸; Q1 ; Q2 ) = U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ) :
and where we assume that U (Q1 ; Q2 ) is quasi-concave so that the indi¤erence
curves have the correct shape. Thus the …rst requirement for a global maximum
is satis…ed by assumption. It is also the case that the constraint is linear since:
g (Q1 ; Q2 ) = Y ¡ P1 Q1 ¡ P2 Q2
so the second requirement is also satis…ed. Now from the …rst-order conditions
we have:
@U (Q¤1 ; Q¤2 )
= ¸¤ P1 :
M U1 (Q¤1 ; Q¤2 ) =
@Q1
Since P1 > 0 and M U1 (Q¤1 ; Q¤2 ) > 0 it follows that ¸¤ > 0 so that the third
requirement for a global maximum is satis…ed. We therefore conclude that
¸¤ ; Q¤1 ; Q¤2 correspond to a global maximum.
Example 3 (continued): Consider the problem of maximizing:
U (Q1 ; Q2 ) = 0:3 ln (Q1 ) + 0:7 ln (Q2 )
subject to the budget constraint:
g (Q1 ; Q2 ) = Y ¡ P1 Q1 ¡ P2 Q2 = 0:
We …rst show that U (Q1 ; Q2 ) is concave and hence that it is quasi-concave.
This follows since the Hessian of U (Q1 ; Q2 ) is:
"
#
¡ (Q0:3)2
0
1
H (Q1 ; Q2 ) =
0
¡ (Q0:7
2
2)
is a diagonal matrix with negative diagonal elements and hence is negative
de…nite for all Q1 and Q2 : Obviously the budget constraint is linear so that the
second condition for a global maximum is also satis…ed. Finally we showed that
1
>0
Y
so the third condition for a global maximum is satis…ed. Thus:
¸¤ =
Q¤1 =
0:3Y
0:7Y
; Q¤2 =
P1
P2
corresponds to a global maximum.
The other class of problems one typically encounters is where the objective
function is linear and the constraint is quasi-concave or quasi-convex. In this
case we have:
CHAPTER 4. MULTIVARIATE CALCULUS
254
Theorem 289 If 1) f (x1 ; x2 ; : : : xn ) is linear so that it can be written as:
f (x1 ; x2 ; : : : xn ) = a + b1 x1 + b2 x2 + ¢ ¢ ¢ + bn xn ;
2) the constraint function g (x1 ; x2 ; : : : xn ) is quasi-concave (quasi-convex) and
if 3) ¸¤ > 0 then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n correspond to a constrained global minimum
(maximum).
Example 4 (continued): Consider the problem of minimizing cost:
W L + RK
subject to the constraint:
1
3
g (L; K) = Q ¡ L 2 K 4 = 0:
Obviously the objective function is a linear function of L and K and hence
the …rst condition for a global minimum is satis…ed. The constraint function
g (L; K) is not convex since its Hessian is given by:
· 1 ¡3 3
1
1 ¸
¡ 38 L¡ 2 K ¡ 4
L 2K4
4
H (L; K) =
1
1
1
3
¡ 54
2
¡ 38 L¡ 2 K ¡ 4
16 L K
which is not positive de…nite since:
M2
1
9
3 ¡1 ¡ 1
L K 2 ¡ L¡1 K ¡ 2
64
64
6 ¡1 ¡ 32
= ¡ L K
< 0:
64
=
We can show, however, that g (L; K) is quasi-convex since:
1
3
g (L; K) = Q ¡ L 2 K 4
¶¶
µ µ
3
1
= Q ¡ exp ¡ ¡ ln (L) + ¡ ln (K)
2
4
= r (s (L; K))
where the monotonic function is:
r (x) = Q ¡ exp (¡x) ; r0 (x) = exp (¡x) > 0
and function:
3
1
s (L; K) = ¡ ln (L) + ¡ ln (K)
2
4
is convex since it has a Hessian:
Hs (L; K) =
·
1
2L2
0
0
3
4K 2
¸
CHAPTER 4. MULTIVARIATE CALCULUS
255
which is positive de…nite for all L and K:
Finally we showed that:
1
2
¸¤ = Q¡ 5 (2W ) 5
µ
4
R
3
¶ 35
>0
so that the third condition for a global minimum is satis…ed. We therefore
conclude that:
µ ¶ 35
2
4
¤
¡ 15
5
R
¸ = Q (2W )
3
µ ¶ 35
4
4
¡ 35
¤
5
R
L = Q (2W )
3
µ ¶¡ 25
2
4
4
¤
5
5
R
K = Q (2W )
3
correspond to a constrained global minimum.
Example 5 (continued): Consider the general cost minimization problem
where the objective function is cost:
W L + RK
and the constraint is:
g (L; K) = Q ¡ F (L; K) = 0
and where we assume that F (L; K) is quasi-concave. (Assuming that F (L; K)
is quasi-concave is basically equivalent to assuming that the isoquants bend
towards the origin.)
The objective function which is cost is obviously linear so that the …rst
requirement for a global minimum is satis…ed.
We now show that the constraint is quasi-convex. We have:
Proof. If F (L; K) is quasi-concave then by de…nition it can be written as:
F (L; K) = r (s (L; K))
where r0 (x) > 0 and s (L; K) is concave. Now since:
g (L; K) = Q ¡ F (L; K) = a (b (L; K))
where the monotonic function is:
a (x) = Q ¡ r (¡x) ; a0 (x) = r0 (¡x) > 0
and the convex function is:
b (L; K) = ¡s (L; K)
CHAPTER 4. MULTIVARIATE CALCULUS
256
since the negative of a concave function s (L; K) is convex. It follows then that
g (L; K) = Q ¡ F (L; K) is quasi-convex.
Finally we note from the …rst-order conditions for cost minimization that:
W = ¸¤
¤
@F (L¤ ; K ¤ )
:
@L
¤
Since W > 0 and @F (L@L;K ) > 0; it follows that: ¸¤ > 0 so the third requirement
for a global minimum is satis…ed. We conclude then that ¸¤ ; L¤ ; K ¤ correspond
to a global minimum.
Su¢cient Conditions when neither the objective function nor the constraint is linear
There are cases where neither the objective function f (x1 ; x2 ; : : : xn ) nor the
constraint g (x1 ; x2 ; : : : xn ) is linear. In this case we have:
Theorem 290 If 1) both f (x1 ; x2 ; : : : xn ) and g (x1 ; x2 ; : : : xn ) in the Lagrangian:
L (¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸g (x1 ; x2 ; : : : xn )
are quasi-concave (quasi-convex), and 2) if ¸¤ ; x¤1 ; x¤2 ; : : : x¤n solve the …rst-order
conditions from the Lagrangian for ¸¤ > 0 then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n correspond to
a constrained global maximum (minimum).
Example: Consider a country that produces two goods Q1 and Q2 with utility
function:
U (Q1 ; Q2 ) = Q1 Q2 :
which it wishes to maximize. This is clearly non-linear. The production possibilities curve or constraint satis…es:
Q21 + Q22 = 1
which is plotted below
Q2
Q1
Production Possibilities Curve
:
CHAPTER 4. MULTIVARIATE CALCULUS
257
The constraint can be written as:
g (Q1 ; Q2 ) = 1 ¡ Q21 ¡ Q22 = 0
and is also clearly non-linear.
This leads to the Lagrangian:
¢
¡
L (¸; Q1 ; Q2 ) = Q1 Q2 + ¸ 1 ¡ Q21 ¡ Q22 :
The …rst-order conditions are:
@L (¸¤ ; Q¤1 ; Q¤2 )
@¸
@L (¸¤ ; Q¤1 ; Q¤2 )
@Q1
@L (¸¤ ; Q¤1 ; Q¤2 )
@Q2
¢2
¡
= 1 ¡ (Q¤1 )2 ¡ Q¤2
= 0 =) (Q¤1 )2 + (Q¤2 )2 = 1;
2
2
= Q¤2 ¡ 2¸¤ Q¤1 = 0 =) (Q¤2 )2 = 4 (¸¤ ) (Q¤1 )2
2
= Q¤1 ¡ 2¸¤ Q¤2 = 0 =) (Q¤1 )2 = 4 (¸¤ ) (Q¤2 )2 :
Adding the second and third results we obtain:
³
´
1
2
(Q¤1 )2 + (Q¤2 )2 = 4 (¸¤ ) (Q¤2 )2 + (Q¤1 )2 =) ¸¤ = :
2
Using ¸¤ =
1
2
in the second condition we obtain:
Q¤2 ¡ 2¸¤ Q¤1 = 0 =) Q¤2 = Q¤1
so that using Q¤2 = Q¤1 in the constraint yields:
1
2
2
(Q¤1 ) + (Q¤2 ) = 1 =) Q¤1 = Q¤2 = p :
2
Thus the solution to the …rst-order conditions is: ¸¤ = 12 ; Q¤1 = Q¤2 =
We can show that f (Q1 ; Q2 ) is quasi-concave since
p1 :
2
f (Q1 ; Q2 ) = eln(Q1 )+ln(Q2 )
where ex is a monotonic transformation and ln (Q1 ) + ln (Q2 ) is concave since
it has a Hessian:
"
#
¡ Q12
0
1
Hf (Q1 ; Q2 ) =
0
¡ Q12
2
which is a diagonal matrix with all diagonal elements negative and hence is
negative de…nite for all Q1 ; Q2 :
The constraint: g (Q1 ; Q2 ) is concave, and hence quasi-concave, since it has
a Hessian:
·
¸
¡2
0
Hg (Q1 ; Q2 ) =
0 ¡2
CHAPTER 4. MULTIVARIATE CALCULUS
258
which is negative de…nite for all Q1 ; Q2 : Finally ¸¤ = 1 > 0: Thus f (Q1 ; Q2 )
and g (Q1 ; Q2 ) are quasi-concave and ¸¤ > 0 is satis…ed so that ¸¤ = 12 ; Q¤1 =
Q¤2 = p12 is a global maximum.
The constrained maximum Q¤1 = Q¤2 = p12 yields U (Q¤1 ; Q¤2 ) = 12 units of
1
is just tangent to the
utility and occurs where the indi¤erence curve Q2 = 2Q
1
production possibilities curve as illustrated below:
1
0.8
0.6
Q2
0.4
0.2
0 0.5
4.7
4.7.1
0.6
0.7
Q1
0.8
0.9
1
:
Econometrics
Linear Regression
Consider the simple linear regression model:
Yi = ® + ¯Xi + ei ; i = 1; 2; : : : n:
^ are the values of ® and ¯ which minimize
^ and ¯
The least squares estimators ®
the sum of squares function:
S (®; ¯) =
n
X
i=1
(Yi ¡ ® ¡ ¯Xi )2 :
We have using the sum and chain rules that:
@S (®; ¯)
@®
= ¡2
@S (®; ¯)
@¯
= ¡2
n
X
i=1
n
X
i=1
(Yi ¡ ® ¡ ¯Xi )
Xi (Yi ¡ ® ¡ ¯Xi )
CHAPTER 4. MULTIVARIATE CALCULUS
259
so that the …rst-order conditions for a minimum are:
³
´
à n
!
^
n ³
n
´
@S ®
^; ¯
X
X
X
^ i = 0 =) n^
^=
Yi ¡ ®
®+
= ¡2
^ ¡ ¯X
Xi ¯
Yi
@®
i=1
i=1
i=1
³
´
à n
!
à n
!
^
n
n
³
´
@S ®
^; ¯
X
X
X
X
2 ^
^
= ¡2
^ ¡ ¯Xi = 0 =)
Xi Yi ¡ ®
Xi ®
Xi ¯ =
Xi Yi
^+
@¯
i=1
i=1
i=1
i=1
or in matrix notation:
·
Pnn
i=1 Xi
¸·
¸ · Pn
¸
Pn
®
^
Pni=1 X2i
Pn i=1 Yi
=
:
^
¯
i=1 Xi
i=1 Xi Yi
From the …rst equation it is easy to show that:
^
¹¯
®
^ = Y¹ ¡ X
^ using Cramer’s rule we …nd that:
^ Solving ¯
so the di¢culty is in obtaining ¯:
Pn
Pn
P
n i=1 Xi Yi ¡ ( i=1 Xi ) ( ni=1 Yi )
^
¯=
:
P
P
2
n ni=1 Xi2 ¡ ( ni=1 Xi )
Example: Suppose one has data on the consumption of n = 4 families along
with their income as:
Yi = 72 58 63 55
Xi = 98 80 91 73
where Yi is the consumption and Xi is the income of family i. We wish to
estimate a consumption function of the form:
Yi = ® + ¯Xi + ei
where ¯ is the marginal propensity to consume.
The sum of squares is then:
2
2
2
2
S (®; ¯) = (72 ¡ ® ¡ 98¯) + (58 ¡ ® ¡ 80¯) + (63 ¡ ® ¡ 91¯) + (55 ¡ ® ¡ 73¯) :
^ and ®
^ we need:
To calculate ¯
4
X
i=1
4
X
Xi
¹ = 342 = 85:5
= 98 + 80 + 91 + 73 = 342 =) X
4
Xi2
= 982 + 802 + 912 + 732 = 29614
i=1
4
X
Xi Yi
i=1
4
X
i=1
Yi
= 98 £ 72 + 80 £ 58 + 91 £ 63 + 73 £ 55 = 21444
248
= 62
= 72 + 58 + 63 + 55 = 248 =) Y¹ =
4
CHAPTER 4. MULTIVARIATE CALCULUS
260
It follows then that:
^ = 4 £ 21444 ¡ 342 £ 248 = 0:643
¯
4 £ 29614 ¡ 3422
and:
^ = 62 ¡ 0:643 £ 85:5 = 7:023:
¹¯
®
^ = Y¹ ¡ X
Thus the estimated consumption function is:
Y^i = 7:023 + 0:643Xi
and the estimated marginal propensity to consume is 0:643:
4.7.2
Maximum Likelihood
Maximum likelihood can also be applied to cases where µ is a vector of parameters so that:
µ = µ 1 ; µ2 ; : : : µp
so that the likelihood
L (µ) = L (µ1 ; µ 2 ; : : : µp )
is a multivariate function.
µ
As before we estimate µ by maximizing L (µ) and denote the solution as ^
which solves the …rst-order conditions:
³
´
@L ^µ1 ; ^
µ2 ; : : : ^
µp
= 0 for j = 1; 2; : : : p
@µj
or equivalently if we de…ne the log-likelihood as l (µ) = ln (L (µ)) then:
³
´
@l ^µ 1 ; ^
µ2 ; : : : ^
µp
= 0 for j = 1; 2; : : : p:
@µj
Once ^µ is found from the …rst-order conditions, a 95% con…dence interval
for µ can be found as follows. Let the p £ p matrix:
³ ´
H ^
µ
µ: This is referred to as the
be the Hessian of the log-likelihood evaluated at ^
information matrix. Now calculate
³ ´´¡1
³
µ
¢ = ¡H ^
CHAPTER 4. MULTIVARIATE CALCULUS
261
and let ± j be the j th diagonal element of ¢: Then a 95% con…dence interval for
the unknown µj is
p
^
µj § 1:96 £ ± j :
¤
£
Example 1: Suppose that Yi » N ¹; ¾ 2 so that Yi has a mean of ¹ and a
standard deviation of ¾: We wish to estimate µ1 = ¹ and µ2 = ¾ using maximum
likelihood from a sample Y1 ; Y2 ; : : : ; Yn : The likelihood function is:
n
2
L (¹; ¾) = (2¼)¡ 2 ¾¡n e¡ 2¾2 ((Y1 ¡¹)
1
+(Y2 ¡¹)2 +¢¢¢+(Yn ¡¹)2 )
:
The log-likelihood is given by:
l (¹; ¾) = ln (L (¹; ¾))
´
1 ³
n
= ¡ ln (2¼) ¡ n ln (¾) ¡ 2 (Y1 ¡ ¹)2 + (Y2 ¡ ¹)2 + ¢ ¢ ¢ + (Yn ¡ ¹)2 :
2
2¾
^ = Y¹ the sample mean since:
The maximum likelihood estimator of ¹ is ¹
(Y1 ¡ ¹) + (Y2 ¡ ¹) + ¢ ¢ ¢ + (Yn ¡ ¹)
¾2
(Y1 ¡ ¹
@l (^
¹; ¾
^)
^ ) + (Y2 ¡ ¹
^ ) + ¢ ¢ ¢ + (Yn ¡ ¹
^)
=0=
=)
@¹
¾
^2
=) Y1 + Y2 + ¢ ¢ ¢ + Yn = n^
¹
Y1 + Y2 + ¢ ¢ ¢ + Yn
= Y¹ :
=) ¹
^=
n
@l (¹; ¾)
@¹
=
The maximum likelihood estimator of ¾ is the sample standard deviation
since:
´
n
1 ³
@l (¹; ¾)
= ¡ + 3 (Y1 ¡ ¹)2 + (Y2 ¡ ¹)2 + ¢ ¢ ¢ + (Yn ¡ ¹)2
@¾
¾ ¾
and:
@l (^
¹; ¾
^)
@¾
=
0
´
1 ³
n
+ 3 (Y1 ¡ ¹
^ )2 + (Y2 ¡ ¹
^ )2 + ¢ ¢ ¢ + (Yn ¡ ¹
^ )2 = 0
¾
^ ¾
^
¢2 ¡
¢2
¢2 ´
¡
n
1 ³¡
=) ¡ + 3 Y1 ¡ Y¹ + Y2 ¡ Y¹ + ¢ ¢ ¢ + Yn ¡ Y¹
=0
¾
^ ¾
^
¢2 ¡
¢2
¢2 ´
¡
1 ³¡
Y1 ¡ Y¹ + Y2 ¡ Y¹ + ¢ ¢ ¢ + Yn ¡ Y¹
=) ¾
^2 =
rn ³
¢2 ¡
¢2
¡
¢2 ´
1 ¡
:
=) ¾
^=
Y1 ¡ Y¹ + Y2 ¡ Y¹ + ¢ ¢ ¢ + Yn ¡ Y¹
n
=) ¡
CHAPTER 4. MULTIVARIATE CALCULUS
262
^ and ¾
^ we need the Hessian of
Now to calculate con…dence intervals for ¹
l (¹; ¾) : We have:
@ 2 l (^
¹; ¾
^)
2
@¹
=
=
@ 2 l (^
¹; ¾
^)
@¹@¾
=
=
=
=
¡1 + ¡1 + ¢ ¢ ¢ + ¡1
¾
^2
n
¡ 2
¾
^
(Y1 ¡ ¹
^ ) + (Y2 ¡ ¹
^ ) + ¢ ¢ ¢ + (Yn ¡ ¹
^)
¡2
3
¾
^
¢ ¡
¢
¡
¢
¡
Y1 ¡ Y¹ + Y2 ¡ Y¹ + ¢ ¢ ¢ + Yn ¡ Y¹
¡2
¾
^3
Y1 + Y2 + ¢ ¢ ¢ + Yn ¡ nY¹
¡2
¾
^3
0
=n^
¾2
2
¹; ¾
^)
@ l (^
2
@¾
z
}|
{
¢2 ¡
¢2
¢2 ´
¡
1 ³¡
n
¹
¹
¹
¡ 3 4 Y1 ¡ Y + Y2 ¡ Y + ¢ ¢ ¢ + Yn ¡ Y
=
¾
^2
¾
^
2n
= ¡ 2
¾
^
and so the information matrix is:
H (^
¹; ¾
^) =
·
¡ ¾^n2
0
0
¡ ¾2n
^2
¸
and hence:
¢ = (¡H (^
¹; ¾
^ ))¡1
" 2
#
¾
^
0
n
=
:
^2
0 ¾2n
Thus a 95 % con…dence interval for the unknown ¹ takes the form:
s
¾
^2
¹
^ § 1:96
n
while a 95 % con…dence interval for the unknown ¾ takes the form:
s
¾
^2
:
¾
^ § 1:96
2n
Example 2: Suppose we are given n = 5 observations:
Y1 = 5:5; Y2 = 3:3; Y3 = 7:1; Y4 = 9:2; Y5 = 4:1:
CHAPTER 4. MULTIVARIATE CALCULUS
263
We are seeking the values of ¹ and ¾ which maximize the log-likelihood plotted
below:
ln(L)
sigma
mu
:
l (¹; ¾)
We have
5:5 + 3:3 + 7:1 + 9:2 + 4:1
¹
^ = Y¹ =
= 5:84
5
and
¾
^
s
(5:5 ¡ 5:84)2 + (3:3 ¡ 5:84)2 + (7:1 ¡ 5:84)2 + (9:2 ¡ 5:84)2 + (4:1 ¡ 5:84)2
5
= 2: 12:
=
A 95% con…dence interval for the unknown ¹ is then:
r
2: 122
5:84 § 1:96
5
or:
5:84 § 1:8583:
or:
A 95% con…dence interval for the unknown ¾ is then:
r
2: 122
2: 12 § 1:96
2£5
2: 12 § 1:314: