Mathematical Economics Lecture Notes

MATHEMATICAL ECONOMICS
Lecture Notes1
Alexander W. Richter2
Department of Economics
Auburn University
Fall 2012
1
I am especially grateful to Juergen Jung and Mike Treuden for their tremendous contributions to these
notes. I also thank Michael Kaganovich and Eric Leeper for their guidance and for giving me the opportunity
to develop and teach this course during graduate school. Several classes of Indiana University graduate
students provided suggestions and corrections. Comments are welcome and all remaining errors are mine.
2
Correspondence: Department of Economics, Auburn University, Auburn, AL 36849, USA. Phone:
+1(334)844-8638. E-mail: [email protected].
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 1
1.1
1.2
1.3
Chapter 2
2.1
2.2
2.3
2.4
2.5
Chapter 3
3.1
3.2
Mathematical Preliminaries
vi
1
Single-Variable Calculus . . . . . . . . . . . . . . . . . . . . .
1.1.1 Limits of Functions . . . . . . . . . . . . . . . . . . . .
1.1.2 Definition of a derivative and Tangent Lines . . . . . . .
1.1.3 Properties of the Differential . . . . . . . . . . . . . . .
1.1.4 Single-Variable Maximization . . . . . . . . . . . . . .
1.1.5 Intermediate and Mean Value Theorems . . . . . . . . .
1.1.6 Taylor Approximations . . . . . . . . . . . . . . . . . .
1.1.7 Laws of Logarithms . . . . . . . . . . . . . . . . . . . .
1.1.8 Infinite Series . . . . . . . . . . . . . . . . . . . . . . .
Multivariate Calculus . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Level Surfaces . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Projections . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Gradient Vector and its Relationship to the Level Surface
1.2.4 Gradients and Tangent Planes . . . . . . . . . . . . . . .
1.2.5 Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.6 Second Order Derivatives and Hessians . . . . . . . . . .
Basic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Induction and Examples . . . . . . . . . . . . . . . . . .
1.3.2 Neighborhoods and Open and Closed Sets . . . . . . . .
1.3.3 Convergence and Boundedness . . . . . . . . . . . . . .
1.3.4 Compactness . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Linear Algebra
1
1
3
4
6
7
9
10
12
16
16
16
17
18
21
24
25
25
26
30
32
33
Vector Spaces and Subspaces . . . . . . . . . .
Linear Combinations and Spanning Conditions
Linear Independence and Linear Dependence .
Bases and Dimension . . . . . . . . . . . . . .
Linear Transformations . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Basic Matrix Properties and Operations
Determinants . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Minors, Cofactors, and Evaluating Determinants
3.1.2 Properties of Determinants . . . . . . . . . . .
3.1.3 Singular Matrices and Rank . . . . . . . . . . .
Inverses of Matrices . . . . . . . . . . . . . . . . . . .
ii
33
35
38
39
43
48
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
48
48
48
50
50
A. W. Richter
3.3
3.4
Chapter 4
4.1
4.2
4.3
4.4
4.5
Chapter 5
5.1
5.2
5.3
5.4
Chapter 6
6.1
6.2
Chapter 7
7.1
7.2
7.3
Chapter 8
8.1
8.2
8.3
8.4
CONTENTS
3.2.1 Computation of Inverses . . . . . . . . . .
3.2.2 Properties of Inverses . . . . . . . . . . . .
Quadratic Forms and Definiteness . . . . . . . . .
3.3.1 Quadratic Forms . . . . . . . . . . . . . . .
3.3.2 Definiteness of Quadratic Forms . . . . . .
Eigenvalues and Eigenvectors . . . . . . . . . . . .
3.4.1 Properties of Eigenvalues and Eigenvectors
3.4.2 Definiteness and Eigenvalues . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Concavity, Convexity, Quasi-Concavity, and Quasi-Convexity
Convex Sets . . . . . . . . . . . . . . . . . . . . .
Concave and Convex Functions . . . . . . . . . . .
Concavity, Convexity, and Definiteness . . . . . . .
Quasi-concave and Quasi-convex Functions . . . .
Quasi-concavity, Quasi-convexity, and Definiteness
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Optimization
62
63
65
66
68
72
Unconstrained Optimization . . . . . . . . . . . . .
Constrained Optimization I: Equality Constraints . .
Constrained Optimization II: Non-negative Variables
Constrained Optimization III: Inequality Constraints .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Comparative Statics
72
74
80
82
88
Cramer’s Rule . . . . . . . . . . . .
Implicit Function Theorem . . . . .
6.2.1 Several Exogenous Variables
6.2.2 The General Case . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction to Complex Numbers
Basic Operations . . . . . .
7.1.1 Sums and Products .
7.1.2 Moduli . . . . . . . .
7.1.3 Complex Conjugates
Exponential Form . . . . . .
Complex Eigenvalues . . . .
51
52
52
52
53
57
60
61
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
98
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 98
. 98
. 99
. 99
. 100
. 102
Linear Difference Equations and Lag Operators
Lag Operators . . . . . . . . . . . . . . . . . . . .
First-Order Difference Equations . . . . . . . . . .
Second-Order Difference Equations . . . . . . . .
8.3.1 Distinct Real Eigenvalues . . . . . . . . . .
8.3.2 Complex Eigenvalues . . . . . . . . . . . .
8.3.3 Stability Conditions for Distinct Eigenvalues
8.3.4 Repeated Real Eigenvalues . . . . . . . . .
Systems of Linear Difference Equations . . . . . .
8.4.1 Solution Technique with Real Eigenvalues .
iii
88
89
91
91
105
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
105
106
109
110
112
113
115
116
116
A. W. Richter
CONTENTS
8.4.2 Solution Technique with Complex Eigenvalues . . . . . . . . . . . . . 119
Chapter 9
9.1
9.2
Chapter 10
10.1
10.2
10.3
Chapter 11
11.1
11.2
Chapter 12
12.1
12.2
Linear Differential Equations
122
First-Order Differential Equations . . . . . . . . . . . . . . .
Solving Systems of Differential Equations . . . . . . . . . . .
9.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . .
9.2.2 Homogeneous Systems . . . . . . . . . . . . . . . . .
9.2.3 Constant Coefficient Homogeneous Systems . . . . . .
9.2.4 Constant Coefficient Homogeneous Systems Revisited
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction to Discrete Time Models
139
Deriving Euler Equations . . . . . . . . . . . . . . . . . .
10.1.1 The Dynamic Program: Part I . . . . . . . . . . . .
10.1.2 The Dynamic Program: Part II . . . . . . . . . . .
Solving Dynamic Models . . . . . . . . . . . . . . . . . .
10.2.1 Method of undetermined coefficients . . . . . . . .
10.2.2 Value Function Iteration . . . . . . . . . . . . . . .
10.2.3 Euler Equation Iteration . . . . . . . . . . . . . . .
10.2.4 Howard’s Improvement Algorithm . . . . . . . . .
Additional Examples . . . . . . . . . . . . . . . . . . . .
10.3.1 Model 1: Inelastic Labor, Capital Adjustment Costs
10.3.2 Model 2: Elastic Labor, Capital Adjustment Costs .
10.3.3 Model 3: Habit Persistence . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Linear Discrete Time Models
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction to Continuous Time Models
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
139
140
141
142
142
146
147
148
150
150
151
153
154
Analytical Solution Methods . . . . . . . .
11.1.1 Model Setup . . . . . . . . . . . . .
11.1.2 Log-Linear System . . . . . . . . .
11.1.3 Solution Method I: Direct Approach
11.1.4 Solution Method II: MSV Approach
Numerical Solution Method . . . . . . . . .
11.2.1 Introduction to Gensys . . . . . . .
11.2.2 Model Setup . . . . . . . . . . . . .
11.2.3 Deterministic Steady State . . . . .
11.2.4 Log-Linear System . . . . . . . . .
11.2.5 Gensys Form . . . . . . . . . . . .
11.2.6 Gensys: Behind the Code . . . . . .
Maximum Principle .
Ramsey Problem . .
12.2.1 Model Setup .
12.2.2 Linear System
122
124
124
126
126
136
.
.
.
.
.
.
.
.
Bibliography
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
154
154
155
156
157
158
158
158
159
160
160
161
163
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
163
164
164
165
167
iv
List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
1.7
Definition of a Limit . . . . . . . . . . . . . . .
Intermediate Value Theorem . . . . . . . . . . .
Rolle’s Theorem and Mean Value Theorem . . .
Log-Linear Approximation . . . . . . . . . . . .
Projection of x onto y . . . . . . . . . . . . . .
Gradient Vector Perpendicular to the Level Curve
Interior and Boundary Points . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
8
9
13
17
18
27
4.1
4.2
4.3
4.4
Convex and Non-Convex Sets
Concave and Convex Functions
Quasi-concave but not concave
Not Quasi-concave . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
64
67
69
5.1
5.2
Strict and/or Global Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Constrained Optimization and the Lagrange Principle . . . . . . . . . . . . . . . .
74
76
7.1
Complex Number in Polar Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.1
Second Order Difference Equation: Regions of Stability
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . . . 114
12.1 Ramsey Model Phase Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
v
A. W. Richter
PREFACE
Preface
These notes are intended for a one-semester course in mathematical economics. The goal of this
course is to help prepare students for the mathematical rigor of graduate economics by providing
a balance between theory and application. There are dozens of examples throughout the notes,
which demonstrate many of the theoretical concepts and theorems. Below is a list of short list of
the notation used thoughout these notes.
Symbol
Technical Description
Translation
∀
∃
∧
∨
⇒
⇔
≡ or :=
∈
| or :
⊆
N
Z
Q
R
universal qualification
existential qualification
logical conjunction
logical disjunction
material implication
material equivalence
definition
set membership
set restriction
subset
natural numbers
set of all integers
set of all rational numbers
set of all real numbers
for all
there exists
and
or
implies/if, then
if and only if (iff)
is defined as/is equivalent to
in/element of
such that/given that
contained in
set of positive integers
positive, negative or zero
set of all proper and improper fractions
all rational and irrational numbers
vi
Chapter 1
Mathematical Preliminaries
1.1 Single-Variable Calculus
1.1.1 Limits of Functions
Definition 1.1.1 (Limit of a Function). Let f be a function defined on some open interval that
contains the number a, except possibly a itself. Then we say that the limit of f (x) as x approaches
a is L, and we write
lim f (x) = L.
x→a
That is, if for every number ε > 0 there is a number δ > 0 such that
|f (x) − L| < ε whenever
0 < |x − a| < δ.
Less formally, if anytime x gets within a reasonable distance of a, the function f comes within
a reasonable neighborhood of L, then we say that the limit of f (x) as x approaches a is L (see
figure 1.1). Note that logically the statement “q whenever p” is equivalent to “if p, then q”, where
it is customary to refer to p as the hypothesis and q as the conclusion. The related implication
∼ q ⇒∼ p is called the contrapositive.
Example 1.1.1. Using the definition of a limit, prove the following:
(a) limx→1 (2x2 − 3x + 1)/(x − 1) = 1
Solution: Let ε > 0, suppose |x − 1| < δ, and choose δ = 2ε > 0. Then
2
2x − 3x + 1
− 1 = 2|x − 1| < 2δ = ε.
|f (x) − 1| = x−1
(b) limx→5 x2 − 3x + 1 = 11
Solution: Let ε > 0, suppose |x − 5| < δ, and choose δ = min{1, 8ε }. We can write
|f (x) − 11| = |x2 − 3x − 10| = |(x − 5)(x + 2)|.
To make this small, we need a bound on the size of x + 2 when x is “close” to 5. For example,
if we arbitrarily require that |x − 5| < 1, then
|x + 2| = |x − 5 + 7| ≤ |x − 5| + 7 < 8.
1
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
Figure 1.1: Definition of a Limit
f
L+ε
L
L-ε
a-δ
a a+δ
To make f (x) within ε units of 11, we shall want to have |x + 2| < 8 and |x − 5| < ε/8.
Thus, under the above definition of δ
|f (x) − 11| = |(x − 5)(x + 2)| < 8δ ≤ ε.
(c) limx→−2 x2 + 2x + 7 = 7
Solution: Let ε > 0, suppose |x + 2| < δ, and choose δ = min{1, 3ε }. We can write
|f (x) − 7| = |x2 + 2x| = |x(x + 2)|.
To make this small, we need a bound on the size of x when x is “close” to −2. For example,
if we arbitrarily require that |x + 2| < 1, then
|x| − |2| ≤ |x + 2| < 1,
since |a| − |b| ≤ ||a| − |b|| ≤ |a ± b| by the triangle inequality. Thus, |x| < 3, which implies
|f (x) − 7| = |x(x + 2)| < 3δ ≤ ε.
(d) limx→2 x3 = 8
ε
Solution: Let ε > 0, suppose |x − 2| < δ, and choose δ = min{1, 19
}. Then
|f (x) − 8| = |x3 − 8| = |x − 2||x2 + 2x + 4|
= |x − 2| · |(x − 2)2 + 6(x − 2) + 12|
< δ(δ2 + 6δ + 12)
ε
≤
(1 + 6 + 12)
19
= ε.
2
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
1.1.2 Definition of a derivative and Tangent Lines
The derivative of y with respect to x at a is, geometrically, the slope of the tangent line to the graph
of f at a. The slope of the tangent line is very close to the slope of the line through (a, f (a)) and
a nearby point on the graph, for example (a + h, f (a + h)). These lines are called secant lines.
Thus, a value of h close to zero will give a good approximation to the slope of the tangent line, and
smaller values (in absolute value) of h will, in general, give better approximations. The slope of the
secant line is the difference between the y values of these points divided by the difference between
the x values. The following definition provides the more customary definition of a derivative.
Definition 1.1.2 (Derivative). The function f : R → R is differentiable at a if
f (a + h) − f (a)
.
h→0
h
m = lim
exists. This limit is called the derivative of f at a and is written f ′ (a).
Definition 1.1.3 (Tangent Line). The tangent line to the curve y = f (x) at the point P (a, f (a)) is
the line through P with slope m, provided that the limit exists.
Remark 1.1.1. In general, using standard derivative notation, when a function f of one variable is
differentiable at a point a, the equation of a tangent line to the graph of f at a is:
y = f (a) + f ′ (a)(x − a)
Example 1.1.2. Find an equation of the tangent line to the hyperbola y = 3/x at the point P (3, 1)
using the above definition of a derivative.
Solution: The slope is given by
f (3 + h) − f (3)
= lim
m = lim
h→0
h→0
h
1
1
=− .
= lim −
h→0 3 + h
3
3
3+h
−1
h
Therefore, an equation of the tangent line at the point P (3, 1) is
y = 1 − (x − 3)/3
→
x + 3y − 6 = 0.
Example 1.1.3. Find an equation of the tangent line to the curve y = (x − 1)/(x − 2) at the point
P (3, 2) using the above definition of a derivative.
Solution: The slope is given by
f (3 + h) − f (3)
= lim
h→0
h→0
h
1
= −1.
= lim −
h→0 h + 1
m = lim
h+2
h+1
−2
h
Therefore, an equation of the tangent line at the point P (3, 2) is
y − 2 = −(x − 3)
3
→
x+y =5
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
1.1.3 Properties of the Differential
Theorem 1.1.1 (Classic properties). Suppose that f : I → R and g : I → R are differentiable at
c ∈ I. Then
(i) If k ∈ R, then the function kf is differentiable at c and
(kf )′ (c) = k · f ′ (c)
(ii) The function f + g is differentiable and
(f + g)′ (c) = f ′ (c) + g ′ (c)
(iii) (Product Rule) The function f g is differentiable at c and
(f g)′ (c) = f (c)g′ (c) + g(c)f ′ (c)
(iv) (Quotient Rule) If g(c) 6= 0, then f /g is differentiable at c and
′
f
g(c)f ′ (c) − f (c)g′ (c)
(c) =
.
g
[g(c)]2
Theorem 1.1.2 (Chain Rule). Let I and J be intervals in R, f : I → R, and g : J → R, where
f (I) ⊂ J, and let c ∈ I. If f is differentiable at c and g is differentiable at f (c), then the composite
function g ◦ f is differentiable at c and
(g ◦ f )′ (c) = g′ (f (c)) · f ′ (c).
Example 1.1.4. Using the properties of the differential, differentiate the following, where a, p, q,
and b are constants
(a) y =
1
(x2 +x+1)5
Solution:
q
−5(2x+1)
(x2 +x+1)6
√
1+ x
p
√
Solution: 1/(8 xy(1 + x))
(b) y =
1+
p
(c) y = xa (px + q)b
Solution: y[bp/(px + q) + a/x]
Example 1.1.5. If a(t) and b(t) are positive-valued differentiable functions of t, and if A, α, β are
constants, find expressions for xẋ = dx/dt
x , where
α+β
(a) x = A [a(t)]α + [b(t)]β
Solution:
(α+β)(αa(t)α−1 a′ (t)+β(t)β−1 b′ (t))
[a(t)]α +[b(t)]β
(b) x = A[a(t)]α [b(t)]β
Solution:
ẋ
x
=
βb′ (t)
b(t)
+
αa′ (t)
a(t)
4
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
Example 1.1.6. If F (x) = f (xn g(x)), find a formula for F ′ (x).
Solution: F ′ (x) = f ′ (xn g(x))(xn g ′ (x) + g(x)nxn−1 )
Theorem 1.1.3 (L’Hospital’s Rule). Suppose f and g are differentiable and g ′ (x) 6= 0 near a
(expect possibly at a). Suppose that
lim f (x) = 0
and
x→a
or that lim f (x) = ±∞ and
x→a
Then
lim g(x) = 0
x→a
lim g(x) = ±∞.
x→a
f (x)
f ′ (x)
= lim ′
x→a g(x)
x→a g (x)
lim
if the limit on the right side exists.
x
√
Example 1.1.7. Calculate limx→∞ ln
3 x.
Solution:
ln x
3
1/x
lim √
= lim −2/3 = lim √
= 0.
3
3
x→∞
x x→∞ x
/3 x→∞ x
Example 1.1.8. Calculate limx→∞ ex /x2 .
Solution:
lim ex /x2 = lim e2 /2x = lim e2 /2 = ∞
x→∞
x→∞
x→∞
Theorem 1.1.4 (Inverse Function Theorem). Suppose that f is differentiable on an interval I and
f ′ (x) 6= 0 (no local maxima or minima) for all x ∈ I. Then f is injective, f −1 is differentiable on
f (I), and
1
1
∂x
= (f −1 )′ (y) = ′
=
,
∂y
f (x)
∂y/∂x
where y = f (x).
Example 1.1.9. Let n ∈ N and y = f (x) = x1/n for x > 0. Then f is the inverse of the
function g(y) = y n . Use Theorem 1.1.4 to verify the familiar derivative formula for f : f ′ (x) =
(1/n)x1/n−1 .
Solution:
f ′ (x) =
1
(f −1 )′ (y)
=
1
g′ (y)
=
1
1
1
= x1/n−1
=
n−1
1/n
n−1
ny
n
n(x )
Example 1.1.10. Consider the following function:
y = f (θ) = −
θ
(1 − θ) log(1 − θ)
Use Theorem 1.1.4, find df −1 (y)/dy.
Solution:
df −1 (y)
dθ
1
=
=
= −h
dy
dy
dy/dθ
1
log(1−θ)+θ
(1−θ)2 log2 (1−θ)
5
i =−
(1 − θ)2 log2 (1 − θ)
log(1 − θ) + θ
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
1.1.4 Single-Variable Maximization
Proposition 1.1.1 (First Derivative Test). Suppose that c is a critical number of a continuous function f .
(a) If f ′ changes from positive to negative at c, then f has a local maximum at c.
(b) If f ′ changes from negative to positive at c, then f has a local minimum at c.
(c) If f ′ does not change sign at c, then f has no local maximum or minimum at c.
If the sign of f ′ (x) changes from positive to negative (negative to positive) at c, f is increasing
(decreasing) to the left of c and decreasing (increasing) to the right of c. If follows that f has a local
maximum (minimum) at c.
Proposition 1.1.2 (Second Derivative Test). Suppose f ′′ is continuous near c.
(a) If f ′ (c) = 0 and f ′′ (c) > 0, then f has local minimum at c.
(b) If f ′ (c) = 0 and f ′′ (c) < 0, then f has local maximum at c.
If f ′′ (c) > 0(< 0) near c, f is concave upward (downward) near c. Thus, the graph of f lies
above (below) its horizonal tangent at c and so f has a local minimum (maximum) at c.
√
Example 1.1.11. The height of a plant after t months is given by h(t) = t − t/2, t ∈ [0, 3]. At
what time is the plant at its highest?
Solution: The critical points are given by
√
3
h(0) = 0, h(3) = 3 − ≈ 0.27
2
1
1 set
′
h (t) = √ − = 0 ⇒ t = 1,
2 t 2
where h(1) = 1 − 1/2 = 1/2. Therefore, the plant is at its highest at t = 1 (the slope changes from
positive to negative at t = 1).
Example 1.1.12. A sports club plans to charter a plane. The charge for 60 passengers is $800 each.
For each additional person above 60, all travelers get a discount of $10. The plane can take at most
80 passengers.
(a) If 60 + x passengers fly, what is the total cost?
Solution: T C(x) = ($800 − $10x)(60 + x)
(b) Find the number of passengers that maximizes the total airfare paid the sports club members.
Solution: T C ′ (x) = 200 − 20x. Thus, x∗ = 10 and airfare expenditures are maximized with
70 passengers (T C(x∗ ) = $49, 000).
Example 1.1.13. Let C(Q) be the total cost function for a firm producing Q units of some commodity. A(Q) = C(Q)/Q is then called the average cost function. If C(Q) is differentiable, prove
that A(Q) has a stationary point (critical point) at Q0 > 0 if and only if the marginal cost and the
average cost functions are equal at Q0 . (C ′ (Q0 ) = A(Q0 ))
Solution: By definition, QA(Q) = C(Q). Differentiating with respect to Q yields
QA′ (Q) + A(Q) = C ′ (Q).
Assume A(Q) has a stationary point. Then it is easy to see that A(Q0 ) = C ′ (Q0 ) as desired. Now
assume A(Q) = C ′ (Q). Then QA′ (Q) = 0, which implies that A′ (Q0 ) = 0 as desired since
Q0 > 0.
6
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
Example 1.1.14. With reference to the previous example, let C(Q) = aQ3 + bQ2 + cQ + d, where
a > 0, b ≥ 0, c > 0, and d > 0. Prove that A(Q) = C(Q)/Q has a minimum in the interval (0, ∞).
Then let b = 0 and find the minimum point in this case.
Solution: The average cost function is given by
C(Q)
d
= aQ2 + bQ + c +
Q
Q
A(Q) =
Differentiating with respect to Q yields
A′ (Q) = 2aQ + b −
d set
= 0.
Q2
Given the restrictions on the constants, Q = 0 implies A′ (Q) < 0 and Q = +∞ implies A′ (Q) > 0.
Thus, there exists a Q∗ ∈ (0, ∞) that satisfies the first order condition. To determine whether this
critical point is a minimum or maximum, differentiate A′ (Q) with respect to Q to obtain
A′′ (Q) = 2a +
d
> 0,
Q3
given the restrictions on the parameters. Thus, A(Q) has a minimum in the interval (0, ∞) by the
second derivative test. When b = 0
1/3
d
d
∗
∗
= 2aQ
→ Q =
.
∗
2
(Q )
2a
1.1.5 Intermediate and Mean Value Theorems
Definition 1.1.4 (Intermediate Value Theorem). Suppose that f is continuous on the closed interval
[a, b] and let N be any number between f (a) and f (b), where f (a) 6= f (b). Then there exists a
number c in (a, b) such that f (c) = N .
Simply put, the Intermediate Value Theorem says that the graph of f must cross any horizontal
line between y = f (a) and y = f (b) at one or more points (see figure 1.2).
Example 1.1.15. Show that 2x = 3x for some x ∈ (0, 1).
Solution: Define f (x) = 2x − 3x. Then f is continuous on [0, 1] and f (0) = 1 and f (1) = −1.
Thus, by the Intermediate Value Theorem ∃ x0 ∈ (0, 1) such that f (x0 ) = 0.
Theorem 1.1.5 (Rolle’s Theorem). Let f be a continuous function on [a, b] that is differentiable on
(a, b) such that f (a) = f (b) = 0. Then there exists at least one point c ∈ (a, b) such that f ′ (c) = 0.
The geometric interpretation of Rolle’s theorem is that, if the graph of a differentiable function
touches the x-axis at arbitrary points a and b, where b > a, then for some point c between a and b
there is a horizontal tangent (see figure 1.3a). If we allow the function to have different values at
the endpoints, then we cannot be assured of a horizontal tangent, but there will be a point c ∈ (a, b)
such that the tangent to the graph at x = c will be parallel to the chord between the endpoints of the
graph. This is the essence of the Mean Value Theorem (see figure 1.3b).
Theorem 1.1.6 (Mean Value Theorem). Let f be a continuous function on [a, b] that is differentiable
on (a, b). Then there exists at least one point c ∈ (a, b) such that
f ′ (c) =
f (b) − f (a)
.
b−a
7
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
Figure 1.2: Intermediate Value Theorem
f
f(b)
N
f(a)
a
b
c
Example 1.1.16. As a illustration of one use of the Mean Value Theorem (MVT), we will derive
Bernoulli’s inequality: for x > 0
(1 + x)n ≥ 1 + nx ∀n ∈ N.
Solution: Let f (t) = (1 + t)n on the interval [0, x], which is clearly continuous and differentiable. Then, by the MVT, there exists a c ∈ (0, x) such that
f (x) − f (0) = f ′ (c)(x − 0).
Thus, we have
(1 + x)n − 1 = nx(1 + c)n−1 ≥ nx,
since f ′ (c) = n(1 + c)n−1 , 1 + c > 1, and n − 1 ≥ 0.
Example 1.1.17. Use the Mean Value Theorem (MVT) to establish the following inequalities, assuming any relevant derivative formulas.
(a) ex > 1 + x for x > 0
Solution: Define f (x) = ex and recall from the MVT that f (b) − f (a) = f ′ (c)(b − a) for
some c ∈ (0, x). Then
ex − 1 = ec (x − 0) = ec x > x
since ec > 1 for c > 0.
√
(b) 18 < 51 − 7 < 71
Solution: Define f (x) =
(49, 51) such that
Consequently, we have
√
x and consider the interval [49, 51]. Then by the MVT, ∃c ∈
√
√
1
51 − 49
= f ′ (c) = √ .
51 − 49
2 c
√
1
√ = 51 − 7,
c
8
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
Figure 1.3: Rolle’s Theorem and Mean Value Theorem
f
f(b)
secant
f(c)
f(a)
tangent at c
a
a
b
c
(a) Rolle’s Theorem
which implies
(c)
√
c
b
(b) Mean Value Theorem
√
1
1
1
1
1
1
= √ < √ < √ = 51 − 7 < √ = .
8
c
7
64
51
49
x−24
10
for x > 24
√
Solution: Define f (x) = 1 + x and consider the interval [24, x]. Then by the MVT, ∃c ∈
(24, x) such that
√
x − 24
x − 24
,
<
f (x) − f (24) = 1 + x − 5 = f ′ (c)(x − 24) = √
10
2 1+c
√
√
since c > 24, 1 + c > 25 = 5.
1+x<5+
1.1.6 Taylor Approximations
Definition 1.1.5 (Taylor’s Theorem). Let f and its first n derivatives be continuous on [a, b] and
differentiable on (a, b), and let x0 ∈ [a, b]. Then for each x ∈ [a, b] with x 6= x0 , there exists a point
c between x and x0 such that
f (x) = f (x0 )+f ′ (x0 )(x−x0 )+
f (n) (x0 )
f (n+1) (c)
f ′′ (x0 )
(x−x0 )2 +· · ·+
(x−x0 )n +
(x−x0 )n+1 .
2!
n!
(n + 1)!
Taylor’s theorem can be viewed as an extension of the MVT in the sense that taking x = b,
x0 = a, and n = 0 in Taylor’s theorem yields the earlier result.
Example 1.1.18. As an illustration of the usefulness of Taylor’s theorem in approximations, consider f (x) = ex for x ∈ R. To find the nth Taylor polynomial for f at x0 = 0, recall that
f (n) (x) = ex for all n ∈ N. Thus, we have
pn (x) = 1 + x +
xn
x2 x3
+
+ ··· +
.
2!
3!
n!
9
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
To find the error involved in approximating ex by pn (x), we use the remainder term given by Taylor’s theorem. That is,
f (n+1) (c) n+1
ec xn+1
Rn (x) =
x
=
,
(n + 1)!
(n + 1)!
where c is some number between 0 and x. For example, suppose that we take n = 5 and compute
the error when x ∈ [−1, 1]. Since c is also in [−1, 1], a simple calculation shows that |R5 (x)| ≤
e/6! < 0.0038. Thus, for all x ∈ [−1, 1], the polynomial
1+x+
x2 x3 x4 x5
+
+
+
2!
3!
4!
5!
differs from ex by less than 0.0038.
Remark 1.1.2. For the special case x0 = 0, the Taylor Series is known as a Maclaurin series.
Example 1.1.19. Find an approximation to the following function about the given point
f (x) = (1 + x)5 ,
a = 0.
Solution: Applying the above formula
20(1 + a)3
60(1 + a)2
5(1 + a)4
(x − a) +
(x − a)2 +
(x − a)3 +
f (x) ≈ (1 + a)5 +
1!
2!
3!
120(1 + a)
120
4
5 (x − a) +
(x − a) 4!
5!
a=0
= 1 + 5x + 10x2 + 10x3 + 5x4 + x5 .
Example 1.1.20. Find quadratic approximations to the following functions about the given points:
(a) F (K) = AK α ,
K0 = 1
Solution: F (K) ≈ A[1 + α(K − 1) +
(b) f (ε) = 1 + 23 ε + 21 ε2
Solution: f (ε) ≈ 1 +
(c) H(x) = (1 + x)−1 ,
1/2
3
4ε
− 1)2 ]
ε0 = 0
,
−
α(α−1)
(K
2
1 2
32 ε
x0 = 0
Solution: H(x) ≈ 1 − x + x2
1.1.7 Laws of Logarithms
Consider a function, f with domain A and range B. If a > 0 and a 6= 1, the exponential function
f (x) = ax is either increasing or decreasing, and thus injective. It therefore has an inverse function
f −1 , which is called the logarithmic function with base a and is denoted loga . According to the
definition of an inverse function,
f −1 (x) = y
⇐⇒
ay = x
loga x = y
⇐⇒
ay = x
Thus, we have
10
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
Moreover, since
f −1 (f (x)) = x for every x in A
f (f −1 (x)) = x for every x in B
it is the case that
loga (ax ) = x
aloga x = x
for every x ∈ R
for every x > 0
The key properties of logarithmic functions are as follows:
(a) loga (xy) = loga x + loga y
(b) loga xy = loga x − loga y
(c) loga xr = r loga x (where r is a real number)
Natural Logarithms
The mathematical constant, e, is the unique real number such that the value of the derivative (the
slope of the tangent line) of the exponential function f (x) = ax at the point x = 0 is exactly 1. The
logarithm with base e is called the natural logarithm and has a special notation
loge x = ln x
Thus, the above properties generalize to
ln x = y
ln(ex ) = x
eln x = x
⇐⇒
ey = x
for every x ∈ R
for every x > 0
In particular, if we set x = 1, we get
ln e = 1
Finally, when y = loga x, we have ay = x. Thus, applying the natural logarithm to both sides of
this equation, we get y ln a = ln x. Thus, the change of base formula is given by
y = loga x =
ln x
ln a
Example 1.1.21. Express ln a + 21 ln b as a single logarithm.
√
Solution: ln a + 12 ln b = ln a + ln b1/2 = ln(a b)
Example 1.1.22. Find the inverse function of the following: m = f (t) = 24 · 2−t/25 .
t
m
= 2−t/25 ⇒ ln m − ln 24 = − 25
ln 2 ⇒ t = f −1 (m) = ln252 (ln 24 − ln m)
Solution: 24
Example 1.1.23. If f (x) = 2x + ln x, find f −1 (2)
Solution: Define y = f (x). Then f −1 (y) = x. Thus, at y = 2, we have 2x+ln x = 2 ⇒ x = 1.
It immediate follows that f −1 (2) = 1
11
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
Example 1.1.24. Calculate limx→∞ (1 + 1/x)x .
Solution: Define y = (1 + 1/x)x . Then ln y = x ln(1 + 1/x). We must first evaluate the limit
of the right-hand-side as x → ∞. Using L’Hospital’s Rule, we obtain
ln(1 + 1/x)
1
= lim
= 1.
x→∞
x→∞ 1 + 1/x
1/x
lim x ln(1 + 1/x) = lim
x→∞
Thus, ln y → 1 as x → ∞. Since ex is a continuous function, we have y = eln y → e as x → ∞.
Example 1.1.25. Find a linear approximation to the following function about the given point:
xt = x̄eln (xt /x̄) = x̄eln xt −ln x̄ ≡ x̄ex̂t = f (x̂t ),
a = x̂ = 0.
Then show that log deviations are a good approximation for percent changes.
Solution:
xt ≈ f (0) + f ′ (0)(x̂t − 0)
= x̄e0 + x̄e0 (x̂t − 0)
= x̄[1 + x̂t ]
which implies that
x̂t ≡ ln xt − ln x̄ ≈
xt − x̄
x̄
Since the above result is only a first order approximation it includes an error term. xt /x̄ is
interpreted as a gross deviation of an observation, xt , from its steady state value (a value near one).
Log linearization approximates the percent change, or the net deviation from the steady state value
(a value near zero). We can see in the following graph that the approximation becomes less accurate
as xt /x̄ moves away from one. For example, suppose xt /x̄ = 1.5, a gross deviation of 150% (net
deviation of 50%). Log linearization yields a net deviation of 40.55%. Thus, the approximation has
an error of nearly 10 percentage points. Figure 1.4 illustrates that the approximation attains more
accurate results when each observation is within a short distance of its steady state value.
1.1.8 Infinite Series
If we add the terms of an infinite sequence {an }∞
n=1 we get an expression of the form
a1 + a2 + a3 + · · · + an + · · · ,
which is called an infinite series (or just a series) and is denoted by
∞
X
an .
n=1
The logical question is whether it makes sense to talk about the sum of infinitely many terms? It
would be impossible to find a finite sum for the series
1 + 2 + 3 + 4 + ··· + n + ···
12
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
Figure 1.4: Log-Linear Approximation
1
xt −x̄
x̄
0.8
ln(xt /x̄)
Error
0.6
0.4
Better Approx.
Around 1
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
0
0.5
1
1.5
2
2.5
because if we start adding the terms we get the cumulative sums 1, 3, 6, 10, . . . and after the nth
term, we get n(n + 1)/2, which becomes very large as n increases. However, if we start to add the
terms of the series
1 1 1
1
1
+ + +
+ ··· + n + ···
2 4 8 16
2
31
n
we get 12 , 34 , 78 , 15
16 , 32 , . . . , 1 − 1/2 , . . ., which become closer and closer to 1. Using this idea, we
can define a new sequence {sn } of partial sums given by
sn =
n
X
k=1
ak = a1 + a2 + · · · + an .
If {sn } converges to a real number s, we say that the series is convergent and we write
∞
X
an = s.
n=1
P
We also refer to s as the sum of the series ∞
A series that is not convergent is called
n=1 an . P
∞
divergent.
If
lim
s
=
+∞,
we
say
that
the
series
n
n=1 an diverges to +∞ and we write
P∞
a
=
+∞.
n=1 n
13
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
Example 1.1.26. For the infinite series
P∞
n=1 1/[n(n
+ 1)], we have the partial sums given by
1
1
1
1
+
+
+ ··· +
1·2 2·3 3·4
n(n + 1)
1 1
1 1
1 1
1
1
−
−
−
−
=
+
+
+ ··· +
1 2
2 3
3 4
n n+1
1
=1−
n+1
sn =
This is an example of a telescoping series, so called because of the way in which the terms in the
partial sums cancel. Since the sequence of partial sums converges to 1 for n large, we have
∞
X
1
= 1.
n(n + 1)
n=1
Example 1.1.27. Show that the harmonic series, given by,
∞
X
1 1 1
1
= 1 + + + + ···
n
2 3 4
n=1
is divergent.
Solution: The partial sums are given by
s1 = 1
s2 = 1 +
s4 = 1 +
s8 = 1 +
>1+
=1+
1
2
1
2
1
2
1
2
1
2
1 1
1
+
+
>1+
3 4
2
1 1
1
+
+
+
+
3 4
5
1 1
1
+
+
+
+
4 4
8
1 1
3
+ + =1+
2 2
2
1 1
2
+
+
=1+
4 4
2
1 1 1
+ +
6 7 8
1 1 1
+ +
8 8 8
Similarly, s16 > 1 + 42 , s32 > 1 + 52 , and in general
s2n > 1 +
n
.
2
This shows that s2n → ∞ as n → ∞ and so {sn } is divergent. Therefore the harmonic series
diverges.
P
Theorem 1.1.7. If ∞
n=1 an is a convergent series, then lim an = 0.
P∞
Proof. If n=1 an converges, then the sequence of partial sums (sn ) must have a finite limit. But
an = sn − sn−1 , so limn→∞ an = limn→∞ sn − limn→∞ sn−1 = 0.
Remark 1.1.3. The converse
of Theorem 1.1.7 is not true in general. If limn→∞
P∞
P∞an 1 = 0, we
cannot conclude that n=1 an converges. Observe that P
for the harmonic series n=1 n we have
1
an = 1/n → 0 as n → ∞, but we showed in 1.1.27 that ∞
n=1 n is divergent.
14
A. W. Richter
1.1. SINGLE-VARIABLE CALCULUS
Remark 1.1.4 (Test for Divergence). The contrapositive of Theorem 1.1.7, however, provides a
useful
P∞ test for divergence. If limn→∞ an does not exist or if limn→∞ an 6= 0, then the series
n=1 an is divergent.
Example 1.1.28. One of the most useful series is the geometric series,
a + ar + ar 2 + ar 3 + · · · + ar n−1 + · · · =
∞
X
ar n−1 .
n=1
Each term is obtained from the preceding one by multiplying it by the common ratio r. If r = 1,
then sn = na → ±∞. Since the limn→∞ sn does not exist, the geometric series diverges in this
case. If r 6= 1, we have
and
sn = a + ar + ar 2 + ar 3 + · · · + ar n−1
rsn = ar + ar 2 + ar 3 + · · · + ar n−1 + ar n .
Subtracting these equations, we obtain
sn − rsn = a − ar n ,
which implies
sn =
If −1 < r < 1, then r n → 0 as n → ∞. Thus
a(1 − r n )
.
1−r
a
a(1 − r n )
=
.
n→∞
1−r
1−r
lim sn = lim
n→∞
In other words, the sum of a convergent geometric series is the first term divided by one minus the
common ratio.
Example 1.1.29. In the previous example, we found that for |r| < 1
∞
X
ar n−1 =
∞
X
ar n =
n=0
n=1
a
.
1−r
Differentiating the above equation gives
∞
X
anr n−1 =
n=1
a
.
(1 − r)2
This result is particularly useful for deriving a closed solution for the expected value of a discrete
geometric random variable.
Example 1.1.30. Find the sum of each series
1.
P∞
n=1
1 n
3
Solution:
2.
P∞
n=1
1 n
3
=
1/3
1−(1/3)
=
1
2
P∞
n=3
1 n
2
Solution:
15
P∞
n=3
1 n
2
=
1/8
1−(1/2)
=
1
4
A. W. Richter
1.2. MULTIVARIATE CALCULUS
1.2 Multivariate Calculus
1.2.1 Level Surfaces
Even though many graphs have a three-dimensional representation, they are often drawn only in two
dimensions for simplicity. The information about elevation can be captured by drawing in a set of
contour lines, with each contour line representing all points with a specific elevation. The contour
lines are not graphs of the function but are instead what are called level curves (level surfaces in
higher dimensions). Level curves are so commonly used in economics that they often have special
names. An indifference curve for a consumer is a level curve containing all bundles of goods that
attain a certain level of utility. An isoquant in production theory is a level curve containing all
bundles of inputs that attain a certain output level.
To see the difference between the graph of a function and the graph of a level curve, consider
the function z = f (x, y) = 25 − x2 − y 2 . To find a level curve corresponding to z = 16, we take
a plane at height 16 and intersect it with the graph and then project the intersection down to the x-y
plane. A level curve is always in a figure with dimension one less than the graph (a two-dimensional
plane versus three-dimensional space). More formally, we have the following definition.
Definition 1.2.1. The graph of f : Rn → R1 is the set of points in Rn+1 given by {x, f (x)|x ∈ Rn }.
The level surface corresponding to f ≡ c (for some c ∈ R) is {x ∈ Rn |f (x) = c}
1.2.2 Projections
Definition
(Scalar Product). For x, y ∈ Rn , the scalar product (or dot product) of x and y is
P1.2.2
n
x · y := i=1 xi yi .
Definition 1.2.3 (Orthogonal Vectors). For x, y ∈ Rn , x and y are orthogonal if x · y = 0.
√
Definition 1.2.4 (Vector Length). For x ∈ Rn , the length (or norm) of x is kxk := x · x.
Theorem 1.2.1. If θ is the angle between vectors x and y, then
x · y = kxkkyk cos θ
Definition 1.2.5 (Projections).
• Scalar Projection of x onto y:
compy x =
• Vector Projection of x onto y:
projy x =
x·y
kyk
x·y
y
kyk2
In order to see these formulas more clearly, note that cos θ = ktyk/kxk. Thus,
ktyk = kxk cos θ =
x·y
kxkkyk cos θ
=
,
kyk
kyk
which is the formula for the scalar projection of x onto y. Moreover, using the fact that ty =
ktyky/kyk, simple algebra yields the vector projection of x onto y.
For t = (x · y)/kyk2 , (x − ty) · y = 0 (See figure 1.5). Thus, we can decompose x into two
parts, one a multiple of y, ty, and the other orthogonal to y, x − ty.
Given a vector y, which vector x with norm c > 0 maximizes x · y? The set of vectors with
kxk = c consists of all those vectors with heads on the “sphere” (could be a higher dimension) with
radius c. To simplify the problem assume kyk = 1. For any x, if the projection of x on y is ty, then
x·y
y.
projy x ≡ ty =
kyk2
16
A. W. Richter
1.2. MULTIVARIATE CALCULUS
Figure 1.5: Projection of x onto y
x
x - ty
ty
y
Thus equating coefficients, we obtain x · y = tkyk2 = t. Since x · y = kxkkyk cos θ, x · y is
maximized when either kxk or kyk can be increased or when the angle θ between the two vectors
is minimized so that cos(θ) → 1, which implies the projection ty is maximized. Thus, to maximize
x · y we must make the projection on y as large as possible subject to kxk = c. By the Pythagorean
theorem, kxk2 = ktyk2 + kx − tyk2 . But kxk2 = c2 and ktyk2 = t2 kyk2 = t2 so, rearranging,
t2 = c2 − kx − tyk2 . Since the last term is nonnegative, t ≤ c. To make t as large as possible, set
x − ty = 0, so t = c. Thus to maximize x · y subject to kxk = c, set x = cy. [If kyk =
6 1, we must
adjust the solution to x = (c/kyk)y.] Intuitively, we obtain the result that in order to maximize the
dot product between x and y, both vectors must point in the same direction.
1.2.3 Gradient Vector and its Relationship to the Level Surface
For a function f , consider the level surface S given by f (x1 , x2 , . . . , xn ) = k through a point
P (x1,0 , x2,0 , . . . , xn,0 ). Let C be any curve that lies on the surface S and passes through the point
P , where the curve is described by a continuous vector function, r(t) = hx1 (t), x2 (t), . . . , xn (t)i.
Let t0 be the parameter value corresponding to P ; that is r(t0 ) = hx1,0 , x2,0 , . . . , xn,0 i. Since C
lies on S, any point (x1 (t), x2 (t), . . . , xn (t)) must satisfy
f (x1 (t), x2 (t), . . . , xn (t)) = k.
By the Chain Rule, its total derivative is
∂f dx2
∂f dxn
∂f dx1
+
+ ··· +
= ∇f · r′ (t) = 0.
∂x1 dt
∂x2 dt
∂xn dt
Since ∇f = hfx1 , fx2 , . . . , fxn i and r′ (t) = hx′1 (t), x′2 (t), . . . , x′n (t)i, at t = t0 the above condition
can be written
∇f (x1,0 , x2,0 , . . . , xn,0 ) · r′ (t0 ) = 0.
Thus, the gradient vector at P , ∇f (x1,0 , x2,0 , . . . , xn,0 ), is perpendicular to the tangent vector r′ (t)
to any curve C on S that passes through P . To illustrate, consider a function f of two variables and
a point P (x0 , y0 ) in its domain. The gradient vector ∇f (x0 , y0 ) gives the direction of the fastest
increase of f and is perpendicular to the level curve f (x, y) = k that passes through P . This makes
intuitive sense since the values of f remain constant as we move along the curve.
17
A. W. Richter
1.2. MULTIVARIATE CALCULUS
Figure 1.6: Gradient Vector Perpendicular to the Level Curve
x
Level Curve
y
1.2.4 Gradients and Tangent Planes
Proposition 1.2.1. Assume that f : Rn → R1 is differentiable at a with gradient vector ∇f (a).
The following properties are consequences of differentiability:
(i) f is continuous at a.
(ii) For all unit vectors, u, the directional derivative in the direction of u is fu (a) = ∇f (a) · u.
(iii) For all i the ith component of ∇f (a) is
∂f (a)
∂xi .
(iv) The equation of the tangent plane to the graph of f at (a, f (a)) is given by
f (x) − f (a) = ∇f (a) · (x − a).
Note the similarity to the single-variable case.
(v) The equation of the tangent plane to the level curve corresponding to f (x) ≡ f (a) at the
point a is given by
0 = ∇f (a) · (x − a).
(vi) The marginal rate of substitution of xi for xj along the level curve corresponding to f (x) ≡
f (a) at the point a is the number of units of xj , which must be removed in order to maintain
a constant “output” f when a unit of xi is added and all other “inputs” are unchanged. The
change in input j (i) is vj (vi ) and all other inputs are unchanged, so vk = 0 for k 6= i, j.
More formally we have
∂f
∂f
(a)vi +
(a)vj or
∂xi
∂xj
∂f (a)
vj
∂xj ∂xi
=
= − ∂f (a)
.
∂xi x=a
vi
0=
f (x)≡f (a)
18
∂xj
A. W. Richter
1.2. MULTIVARIATE CALCULUS
(vii) The direction of change of inputs x which most increases output f (x) starting at a is the
direction ∇f (a).
Example 1.2.1. u(x, y, z) = x + y + z 2 is differentiable at (1, 1, 1).
 
 
1
1



(a) To find ∇u(1, 1, 1), ∇u(x, y, z) = 1 , so ∇u(1, 1, 1) = 1.
2z
2
(b) To find the equation of the tangent plane to the graph of u at (1, 1, 1), u(1, 1, 1) = 3, so


x
−
1
u(x, y, z) − 3 = ∇u(1, 1, 1) · y − 1 = (x − 1) + (y − 1) + 2(z − 1).
z−1
(c) The equation of the tangent plane to the level curve corresponding to u ≡ 3 at (1, 1, 1) is


x−1
0 = ∇u(1, 1, 1) · y − 1 = (x − 1) + (y − 1) + 2(z − 1).
z−1
(d) To find the marginal rate of substitution of x for z along the level curve u ≡ 3 at the point
(1, 1, 1),
∂u
∂u
(1, 1, 1)∆x +
(1, 1, 1)∆z
0=
∂x
∂z
so
∂u(1,1,1)
∂z 1
∂x
=− .
=
−
∂u(1,1,1)
∂x (x,y,z)=(1,1,1)
2
∂z
u=3
(e) To find the direction of change of inputs which yields the largest increase in output u at
(1, 1, 1), the direction is
 
1
1  
1
∇u(1, 1, 1) = √
1 ,
k∇u(1, 1, 1)k
6 2
assuming the magnitude of the total change in inputs is unity.
Example 1.2.2. Output, Y , is produced using two inputs, K and L, according to Y = f (K, L) =
1 + (KL − 1)1/3 .
(a) What is the equation of the tangent plane to the production surface at the point corresponding
to K = 1 and L = 2?
Solution:
Y = f (1, 2) + ∇f (1, 2) ·
K −1
L−2
2
1
= 2 + (K − 1) + (L − 2)
3
3
or, equivalently, Y = 23 K + 31 L + 23 .
19
A. W. Richter
1.2. MULTIVARIATE CALCULUS
(b) What is the equation of the tangent plane to the isoquant (level curve) corresponding to Y = 3
at (K, L) = (1, 9)?
Solution:
3 = 3 + ∇f (1, 9) ·
→
0=
K−1
L−9
3
1
(K − 1) + (L − 9)
4
12
or, equivalently, 9K + L = 18.
(c) What is the Marginal Rate of Substitution (MRS) of L for K along the isoquant corresponding
to Y = f (2, 1) at (K, L) = (2, 1)?
Solution: The Equation of the tangent plane to the level curve at (K, L) = (2, 1) is
1 2
0 = (∆K, ∆L) · (fK (2, 1), fL (2, 1)) = (∆K, ∆L) · ( , ),
3 3
which implies that MRS = ∆K/∆L = −2.
(d) If starting at (K, L) = (2, 1), a tiny (marginal) amount of inputs could be added in any
proportions with ||(∆K, ∆L)|| = ε, how many extra units of L should be added for each
additional unit of K to maximize the increase in output (i.e., what is the ratio ∆L/∆K)?
Solution: Equation of the tangent plane to the production surface at (2, 1) is
1 2
Y − f (2, 1) = (∆K, ∆L) · ( , ).
3 3
An increase in output is maximized when the scalar product of (∆K, ∆L) and ∇f (2, 1) is
maximized. Thus,
ε 2ε
∆L
1 2
ε
= 2.
→
(∆K, ∆L) = 1 2 ( , ) = √ , √
∆K
k3, 3k 3 3
5
5
Example 1.2.3. A firm uses capital, K, and labor, L, to produce output, Y , according to Y =
K α Lβ , where α and β are positive constants.
(a) What is the equation of the isoquant corresponding to a level of output equal to one?
Solution: When Y = 1, the corresponding level curve is K α Lβ = 1.
(b) What is the equation of the tangent plane to the production surface at K = L = 1?
Solution:
∂Y
∂Y
(1, 1)(K − 1) +
(1, 1)(L − 1)
∂K
∂L
= 1 + α(K − 1) + β(L − 1)
Y = Y (1, 1) +
(c) What is the equation of the tangent “plane” to the level curve corresponding to Y = 1 at
K = L = 1?
Solution:
0 = α(K − 1) + β(L − 1)
20
→
αK + βL = α + β.
A. W. Richter
1.2. MULTIVARIATE CALCULUS
(d) For small changes along the level curve, starting at K = L = 1, how many units of labor are
needed to replace each unit of capital (i.e. What is the MRSL→K )?
Solution: The Equation of the tangent plane to the level curve at (K, L) = (1, 1) is
0 = (∆K, ∆L) · (fK (1, 1), fL (1, 1)) = (∆K, ∆L) · (α, β),
which implies that ∆L/∆K = −α/β.
(e) If it were possible to increase K and L slightly in any proportion so that ||(∆K, ∆L)|| = c,
where c is a very small positive number, what change in K and L would lead to the greatest
increase in output?
Solution: Equation of the tangent plane to the production surface at (1, 1) is
Y = 1 + (∆K, ∆L) · (α, β).
Thus, the maximum value of Y is attained when the above dot product is maximized. This
will occur when
c
(∆K, ∆L) = p
(α, β).
α2 + β 2
1.2.5 Chain Rule
Definition 1.2.6 (Chain Rule–Case 1). Suppose that z = f (x, y) is a differentiable function of x
and y, where x = g(t) and y = h(t) are both differentiable functions of t. Then z is a differentiable
function of t and
∂z dx ∂z dy
dz
=
+
.
dt
∂x dt
∂y dt
Definition 1.2.7 (Chain Rule–Case 2). Suppose that z = f (x, y) is a differentiable function of x
and y, where x = g(s, t) and y = h(s, t) are both differentiable functions of s and t. Then
∂z ∂x ∂z ∂y
∂z
=
+
∂s
∂x ∂s ∂y ∂s
and
∂z
∂z ∂x ∂z ∂y
=
+
.
∂t
∂x ∂t
∂y ∂t
Definition 1.2.8 (Chain Rule–General Version). Suppose that u is a differentiable function of the n
variables x1 , . . . , xn and each xj is a differentiable function of the m variables t1 , . . . , tm . Then u
is a function of t1 , . . . , tm and
∂u
∂u ∂x1
∂u ∂x2
∂u ∂xn
=
+
+ ··· +
∂ti
∂x1 ∂ti
∂x2 ∂ti
∂xn ∂ti
for each i = 1, 2, . . . , m.
Example 1.2.4. Given z = f (x, y, t) = yex + t2 where x = g(y, t) = ln(y + t) and y = h(t) =
t3 − 9, use the chain rule to find the total effect of a change in t on z (dz/dt) at t = 2.
Solution:
∂z ∂x dy ∂x
∂z dy ∂z
dz
=
+
+
+
dt
∂x ∂y dt
∂t
∂y dt
∂t
1
1
= yex
3t2 +
+ 3ex t2 + 2t.
y+t
y+t
When t = 2, y = −1 and x = 0, so
dz = −1(12 + 1) + 12 + 4 = 3.
dt t=2
21
A. W. Richter
1.2. MULTIVARIATE CALCULUS
Example 1.2.5. Assume the following functional forms:
z = x2 + xy 3 ,
x = uv 2 + w3 ,
y = u + vew .
Use the chain rule to find the indicated partial derivatives at the given point:
∂z ∂z ∂z
,
,
, when u = 2, v = 1, w = 0.
∂u ∂v ∂w
Solution: First note that u = 2, v = 1, w = 0 implies x = 2, y = 3
∂z
∂z ∂x ∂z ∂y
=
+
∂u
∂x ∂u ∂y ∂u
= (2x + y 3 )(v 2 ) + (3xy 2 )
∂z
∂z ∂x ∂z ∂y
=
+
∂v
∂x ∂v
∂y ∂v
3
= (2x + y )(2uv) + (3xy 2 )(ew )
= 85,
= 178,
∂z ∂x
∂z ∂y
∂z
=
+
∂w
∂x ∂w ∂y ∂w
= (2x + y 3 )(3w2 ) + (3xy 2 )(vew )
= 54.
Example 1.2.6. A function is called homogeneous of degree n if it satisfies the equation
f (tx, ty) = tn f (x, y)
for all t, where n is a positive integer and f has continuous second order partial derivatives.
(a) Verify that f (x, y) = x2 y + 2xy 2 + 5y 3 is homogeneous of degree 3.
Solution: Applying the above definition
f (tx, ty) = (tx)2 (ty) + 2(tx)(ty)2 + 5(ty)3 = t3 x2 y + 2t3 xy 2 + 5t3 y 3
= t3 [x2 y + 2xy 2 + 5y 3 ] = t3 f (x, y).
(b) Show that if f is homogeneous of degree n, then
x
∂f
∂f
+y
= nf (x, y).
∂x
∂y
Solution: To see this clearly, rewrite the above definition in the following way:
f (a, b) = tn f (x, y),
where a = tx and b = ty. Then differentiating with respect to t gives
∂f (tx, ty) da ∂f (tx, ty) db
+
= ntn−1 f (x, y)
∂a
dt
∂b
dt
∂f (tx, ty)
∂f (tx, ty)
→
x+
y = ntn−1 f (x, y).
∂a
∂b
Since this equation holds for all t, we can set t = 1 to obtain the desired result.
22
A. W. Richter
1.2. MULTIVARIATE CALCULUS
Theorem 1.2.2 (Young’s Theorem). Suppose that y = f (x1 , x2 , . . . , xn ) is twice continuously
differentiable (C 2 ) on an open region J ∈ Rn . Then, for all x ∈ J and for each pair of indices i, j,
∂2f
∂2f
(x) =
(x).
∂xi ∂xj
∂xj ∂xi
Example 1.2.7. Consider the general Cobb-Douglas production function Q = kxa y b . Then,
∂Q
= akxa−1 y b ,
∂x
∂Q
∂2Q
∂2Q
= bkxa y b−1 ,
= abkxa−1 y b−1 =
.
∂y
∂x∂y
∂y∂x
We can continue taking higher order derivatives, and Young’s theorem holds for these cases.
For example, if we take an x1 x2 x4 derivative of order three, then the order of differentiation does
not matter for a C 3 -function. We can keep going and define kth order partial derivatives and C k
functions. For C k functions, the order with which one takes the kth partial derivatives does not
matter.
Example 1.2.8. If f is homogeneous of degree n, show that
x2
2
∂ 2 f (x, y)
∂ 2 f (x, y)
2 ∂ f (x, y)
+
2xy
+
y
= n(n − 1)f (x, y).
∂x2
∂x∂y
∂y 2
Solution: To obtain the desired result, differentiate the following result with respect to t
∂f (tx, ty)
∂f (tx, ty)
x+
y = ntn−1 f (x, y).
∂a
∂b
Using the chain rule, we obtain
∂ 2 f (tx, ty) db
∂ 2 f (tx, ty) da
∂ 2 f (tx, ty) db
∂ 2 f (tx, ty) da
+x
+y
+y
= n(n − 1)tn−2 f (x, y)
2
∂a
dt
∂a∂b
dt
∂b∂a
dt
∂b2
dt
2
∂ 2 f (tx, ty)
∂ 2 f (tx, ty)
2 ∂ f (tx, ty)
+
2xy
= n(n − 1)tn−2 f (x, y).
+
y
→ x2
∂a2
∂a∂b
∂b2
x
Again, setting t = 1 gives the desired result.
Example 1.2.9. If f is homogeneous of degree n, show that
fx (tx, ty) = tn−1 fx (x, y).
Solution: To obtain the desired result, differentiate
f (a, b) = tn f (x, y)
with respect to x to obtain
∂f (tx, ty) da
∂f (tx, ty)
= tn
∂a
dx
∂x
→
∂f (tx, ty)
∂f (tx, ty)
= tn−1
.
∂x
∂x
Thus, if a function, f , is homogeneous of degree n, its derivative, f ′ , is homogeneous of degree
n − 1. Note that evaluating the function at (tx, ty) and subsequently taking the derivative of
f (tx, ty) with respect to the first argument, tx, is equivalent to taking the derivative of f (x, y)
at and evaluating at (tx, ty).
23
A. W. Richter
1.2. MULTIVARIATE CALCULUS
Example 1.2.10. Verify that partial derivative of f (x, y) = x2 y + 2xy 2 + 5y 3 with respect to x is
homogeneous of degree 2.
Solution: The partial derivative is given by
fx (x, y) = 2xy + 2y 2 .
Evaluating at (tx, ty) gives
fx (tx, ty) = 2(tx)(ty) + 2(ty)2 = t2 [2xy + 2y 2 ] = t2 fx (x, y).
Thus partial derivative of the given function is homogeneous of degree 2.
1.2.6 Second Order Derivatives and Hessians
The Hessian matrix is a square matrix of second-order partial derivatives of a function. Let x ∈ Rn
and let f : Rn → R be a real-valued function having 2nd-order partial derivatives in an open set
U containing x. Given the real-valued function f (x1 , x2 , . . . , xn ), the Hessian matrix of f is the
matrix with elements
H(f )ij (x) = Di Dj f (x),
where x = (x1 , x2 , . . . , xn ) and Di Dj is the differentiation operator with respect to the ijth argument:

 ∂2f
∂2f
∂2f
·
·
·
2
∂x1 ∂x2
∂x1 ∂xn

 ∂x1



 ∂2f
∂2f
∂2f 

·
·
·
2
∂x2 ∂xn 
 ∂x2 ∂x1
∂x2


H(f ) = 

 .
..
.. 
..

 .
.
.
. 
 .




∂2f
∂2f
∂2f
.
∂xn ∂x1
∂xn ∂x2 · · ·
∂x2
n
If all n2 second-order partial derivatives of f exist and are continuous functions of (x1 , x2 , . . . , xn ),
we say that f is twice continuously differentiable or C 2 .
The Bordered Hessian matrix of f is a square matrix of second-order partial derivatives that
is bordered by first-order partial derivatives. Given the real-valued function f (x1 , x2 , . . . , xn ), the
bordered Hessian matrix of f is the matrix


∂f
∂f
∂f
···
0
∂x1
∂x2
∂xn




 ∂f
∂2f
∂2f 
∂2f
 ∂x
· · · ∂x1 ∂xn 
∂x1 ∂x2
∂x21

 1




2
2
2

 ∂f
∂
f
∂
f
∂
f
· · · ∂x2 ∂xn 
H(f ) = 
∂x22

 ∂x2 ∂x2 ∂x1





 .
.
.
.
.
..
..
..
.. 
 ..






2
2
2
∂ f
∂ f
∂ f
∂f
·
·
·
2
∂xn
∂xn ∂x1
∂xn ∂x2
∂x
n
The importance of the Hessian and Bordered Hessian matrices will become clear in later sections.
24
A. W. Richter
1.3. BASIC ANALYSIS
1.3 Basic Analysis
1.3.1 Induction and Examples
Theorem 1.3.1 (Principle of Mathematical Induction). Let P (n) be a statement that is either true
or false for each n ∈ N. Then P (n) is true for all n ∈ N, provided that
(a) P (1) is true, and
(b) for each k ∈ N, if P (k) is true, then P (k + 1) is true.
Example 1.3.1. Prove that 1 + 2 + 3 + · · · + n = 12 n(n + 1) for every natural number n.
Solution: Let P (n) be the statement
1 + 2 + 3 + ··· + n =
1
n(n + 1)
2
Then P (1) asserts that 1 = 21 (1)(1 + 1), P (2) asserts that 1 + 2 = 12 (2)(2 + 1), and so on.
In particular, we see that P (1) is true, and this establishes the basis for induction. To verify the
induction step, we suppose that P (k) is true, where k ∈ N. That is, we assume
1
1 + 2 + 3 + · · · + k = k(k + 1).
2
Since we wish to conclude that P (k + 1) is true, we add k + 1 to both sides to obtain
1
1 + 2 + 3 + · · · + k + (k + 1) = k(k + 1) + (k + 1)
2
1
= [k(k + 1) + 2(k + 1)]
2
1
= (k + 1)(k + 2)
2
1
= (k + 1)[(k + 1) + 1].
2
Thus P (k + 1) is true whenever P (k) is true, and by principle of mathematical induction, we
conclude that P (n) is true for all n.
Since the format of a proof using mathematical induction always consists of the same two steps
(establishing the basis for induction and verifying the induction step), it is common practice to
reduce some of the formalism by omitting explicit reference to the statement P (n). It is also acceptable to omit identifying the steps by name.
Example 1.3.2. Prove by induction that 7n − 4n is a multiple of 3, for all n ∈ N.
Solution: This is true when n = 1, since 71 − 41 = 3. Now let k ∈ N and suppose that 7k − 4k
is a multiple of 3. That is, 7k − 4k = 3m for some m ∈ N. It follows that
7k+1 − 4k+1 = 7k+1 − 7 · 4k + 7 · 4k − 4 · 4k
= 7(7k − 4k ) + 3 · 4k
= 7(3m) + 3 · 4k
= 3(7m + 4k ).
Since m and k are natural numbers, so is 7m + 4k . Thus 7k+1 − 4k+1 is also a multiple of 3, and
by induction we conclude that 7n − 4n is a multiple of 3 for all n ∈ N.
25
A. W. Richter
1.3. BASIC ANALYSIS
In the above example, we have added and subtracted the term 7 · 4k . Where did it come from?
We want somehow to use the induction hypothesis 7k − 4k = 3m, so we break 7k+1 apart into 7 · 7k .
We would like to have 7k − 4k = 3m as a factor instead of just 7k , but to do this we must subtract
(and add) the term 7 · 4k .
Example 1.3.3. Prove that 12 + 22 + · · · + n2 = 16 n(n + 1)(2n + 1) for all n ∈ N.
Solution: This is true when n = 1, since 61 (1)(1 + 1)(2 · 1 + 1) = 1. Now let k ∈ N and
suppose that 12 + 22 + · · · + k2 = 16 k(k + 1)(2k + 1). Adding (k + 1)2 to both sides, it follows that
1
12 + 22 + · · · + k2 + (k + 1)2 = k(k + 1)(2k + 1) + (k + 1)2
6
1
= [2k3 + 3k2 + k] + k2 + 2k + 1
6
1
= [2k3 + 9k2 + 13k + 6]
6
1
= [(k + 1)(k + 2)(2k + 3)].
6
Thus, the above statement holds for n = k + 1 whenever it holds for n = k, and by principle of
mathematical induction, we conclude that the statement is true for all n.
Example 1.3.4. Prove that
1
1
1
n
1
+
+
+ ··· + 2
=
, for all n ∈ N
3 15 35
4n − 1
2n + 1
Solution: This is true when n = 1, since
Adding
1
4(k+1)2 −1
1
2·1+1
= 13 . Now let k ∈ N and suppose that
1
1
1
k
1
+
+
+ ··· + 2
=
.
3 15 35
4k − 1
2k + 1
to both sides, it follows that
1
1
1
1
1
k
1
+
+
+ ··· + 2
+
=
+
3 15 35
4k − 1 4(k + 1)2 − 1
2k + 1 4(k + 1)2 − 1
k
1
=
+ 2
2k + 1 4k + 8k + 3
1
k
+
=
2k + 1 (2k + 1)(2k + 3)
2k2 + 3k + 1
=
(2k + 1)(2k + 3)
k+1
.
=
2k + 3
Thus, the above statement holds for n = k + 1 whenever it holds for n = k, and by principle of
mathematical induction, we conclude that the statement is true for all n.
1.3.2 Neighborhoods and Open and Closed Sets
Definition 1.3.1 (Neighborhood). Let x ∈ R and let ε > 0. A neighborhood of x (or an εneighborhood of x) is a set of the form
N (x; ε) = {y ∈ R : |x − y| < ε}.
The number ε is referred to as the radius of N (x; ε).
26
A. W. Richter
1.3. BASIC ANALYSIS
Figure 1.7: Interior and Boundary Points
Boundary
Point
Interior
Point
A neighborhood of x of radius ε is the open interval (x − ε, x + ε) of length 2ε centered at x.
Definition 1.3.2 (Deleted Neighborhood). Let x ∈ R and let ε > 0. A deleted neighborhood of x is
a set of the form
N ∗ (x; ε) = {y ∈ R : 0 < |x − y| < ε}.
Clearly N ∗ (x; ε) = N (x; ε)\{x}.
Neighborhoods give us a framework within which we can talk about “nearness”.
Definition 1.3.3 (Interior and Boundary Points). Let S be a subset of R. A point x in R is an interior
point of S if there exists a neighborhood N of x such that N ⊆ S. If for every neighborhood N
of x, N ∩ S 6= ∅ and N ∩ (R\S) 6= ∅, then x is called a boundary point of S. The set of all
interior points of S is denoted int S, and the set of all boundary points of S is denoted by bd S (see
figure 1.7).
Example 1.3.5.
(a) Let S be the open interval (0, 5) and let x ∈ S. If ε = min{x, 5 − x}, then we claim that
N (x; ε) ⊆ S. Indeed, for all y ∈ N (x; ε) we have |y − x| < ε, so that
−x ≤ −ε < y − x < ε ≤ 5 − x
Thus 0 < y < 5 and y ∈ S. That is, for some arbitrary point in S, its neighborhood is
completely in S. It follows that every point in S is an interior point of S (S ⊆ int S). Since
the inclusion int S ⊆ S always holds, we have S = int S.
The point 0 is not a member of S, but every neighborhood of 0 will contain positive numbers
in S. Thus 0 is a boundary point of S. Similarly, 5 ∈ bd S and, in fact, bd S = {0, 5}. Note
that none of the boundary of S is contained in S. Of course, there is nothing special about the
open interval (0, 5) in this example. Similar comments would apply to any open interval.
(b) Let S be the closed interval [0, 5]. The point 0 is still a boundary point of S, since every
neighborhood of x will contain negative numbers not in S. We have int S = (0, 5) and
bd S = {0, 5}. This time S contains all of its boundary points, and the same could be said of
any other closed interval.
(c) Let S be the interval [0, 5). Then again int S = (0, 5) and bd S = {0, 5}. We see that S
contains some of its boundary, but not all of it.
27
A. W. Richter
1.3. BASIC ANALYSIS
(d) Let S be the interval [2, ∞). Then int S = (2, ∞) and bd S = {2}. Note that there is no
“point” at ∞ to be included as a boundary point at the right end.
(e) Let S = R. Then int S = S and bd S = ∅.
Example 1.3.6. Find the interior and boundary of the following set: S =
Solution: The above set is
1 1 1
1, , , , · · ·
2 3 4
1
n
:n∈N
Given the distance between points, it is clear that there does not exist a neighborhood N around any
point that is contained in S. Thus, the interior of S = ∅. The boundary of the set is 0 ∪ S
Example 1.3.7. Find the interior and boundary of each set.
√ 1. [0, 3] ∪ (3, 5)
3. r ∈ Q : r ≥ 2
√
Solution: (0, 5), {0, 5}
Solution: ∅, [ 2, ∞)
√ 2. r ∈ Q : 0 < r < 2
4. [0, 2] ∩ [2, 4]
√
Solution: ∅, [0, 2]
Solution: ∅, {2}
Definition 1.3.4 (Open and Closed Sets). Let S ⊆ R. If bd S ⊆ S, then S is said to be closed. If
bd S ⊆ R\S, then S is said to be open.
Theorem 1.3.2.
(a) A set S is open iff S = int S. Thus, S is open iff every point in S is an interior point of S.
(b) A Set S is closed iff its complement R\S is open.
Example 1.3.8. Classify each of the following sets, S, as open, closed, neither, or both.
4. x : |x − 5| ≤ 12
1. N
Solution: Not open: int S = ∅ 6= S,
Closed: bd S = S
Solution: Not open: int S = (4.5, 5.5) 6=
S, Closed: bd S = {4.5, 5.5} ∈ S
5. {x : x2 > 0}
2. Q
Solution: Open: int S = R\{0} = S,
Not Closed: bd S = {0} 6∈ S
1
6. n : n ∈ N
Solution: Neither, int S = ∅ 6= S
and bd S = R * S
T
1
3. ∞
n=1 0, n
Solution: Both: (0, 1) ∩ (0, 21 ) ∩ · · · = ∅
(int ∅ = ∅ and bd ∅ = ∅)
Solution: Neither: int S = ∅ 6= S
and bd S * S (0 6∈ S).
Example 1.3.9. True/False: If S ⊂ R2 is an open set, then f (x, y) = x + y cannot have a global
maximizer subject to (x, y) ∈ S.
Solution: True. Suppose (x∗ , y ∗ ) is a global maximizer of x + y subject to (x, y) ∈ S. Since
S is open, we can always find ε small enough so that an open ε-neighborhood around (x∗ , y ∗ ) is
entirely contained in S. But then a point such as (x∗ + ε/2, y ∗ + ε/2) would lie in S with a value
of the objective function (x∗ + ε/2) + (y ∗ + ε/2); i.e.,
ε
ε
= x∗ + y ∗ + ε > x∗ + y ∗ = f (x∗ , y ∗ ).
f x∗ + , y ∗ +
2
2
28
A. W. Richter
1.3. BASIC ANALYSIS
Thus, (x∗ , y ∗ ) cannot be a global maximizer, and our original assumption must be false. Specifically, ∄(x, y) ∈ S, which is a global maximizer of f (x, y) = x + y.
Theorem 1.3.3.
(a) The union of any collection of open sets is an open set.
(b) The intersection of any finite collection of open sets is an open set.
Proof.
S
(a) Let Gi be an arbitrary open set and let S = ∞
i=1 Gi , where G = {G1 , G2 , . . .} is an arbitrary
collection of open sets. If x ∈ S, then x ∈ Gi for some Gi ∈ G. Since Gi is open, x is an
interior point of Gi . That is, there exists a neighborhood N of x such that N ⊂ Gi . But
Gi ⊂ S, so N ⊂ S, implying that x is an interior point of S. Since we have shown that a
neighborhood of an arbitrary point in S is completely contained in S, S is open.
(b) First note that this result does not hold for infinite collections of sets. To see why, notice that
for
T∞each n ∈ N, if we define An = (−1/n, 1/n), then each An is an open set. However,
n=1 An = {0}, which is not open. Thus we see that we cannot generalize the above result
to the intersection of an infinite collection of open sets.
T
Consider the finite case. Define S := ni=1 Gi , where G = {G1 , G2 , . . . , Gn } is an arbitrary
collection of open sets. If S = ∅, we are done, since ∅ is open (and closed) (int ∅ = ∅). If
S 6= ∅, let x ∈ S. Then x ∈ Gi for all i = 1, 2, . . . , n. Since each Gi is open, there exist
neighborhoods Ni (x; εi ) of x such that Ni (x; εi ) ⊆ Gi . Let ε = min{ε1 , . . . , εn }. Then
N (x; ε) ⊆ Gi for each i = 1, . . . , n, so N (x; ε) ⊆ S. Thus x is an interior point of S, and S
is open.
Corollary 1.3.1.
(a) The intersection of any collection of closed sets is a closed set.
(b) The union of any finite collection of closed sets is a closed set.
Proof.
T
(a) To prove the above result, define T := ∞
i=1 Fi , where F = {F1 , F2 , . . .} is an arbitrary
infinite collection of closed
T = ∅, we are done, since ∅ is closed (and open)
T sets. IfS∞
(bd ∅ = ∅ ⊆ ∅). R \ ( ∞
F
)
=
i
i=1
i=1 (R \ Fi ) (the complement of the intersection will be
the union of the individual
complements,
which can be seen using a simple venn diagram).
T∞
Thus, we have R \ ( i=1 Fi ) equal to the union of open sets, since a set is closed if and only
if its complement is open. Since weThave shown above thatTthe union of any collection of
∞
open sets is open, we have that R \ ( ∞
i=1 Fi ) is open. Thus, i=1 Fi must be closed.
Sn
(b) To prove the above result, define T :=
Sn i=1 Fi , where
Tn F = {F1 , F2 , . . . , Fn } is an arbitrary
finite collection of closed sets. R \ ( i=1 Fi ) = i=1 (R \ Fi ) (the complement of the union
will be the intersection of the individual
S complements, which again can be seen using a simple
venn diagram). Thus, we have R \ ( ni=1 Fi ) equal to the intersection of open sets. Since we
have shown
S above that the intersection
S of any finite collection of open sets is open, we have
that R \ ( ni=1 Fi ) is open. Thus, ni=1 Fi must be closed by definition.
29
A. W. Richter
1.3. BASIC ANALYSIS
1.3.3 Convergence and Boundedness
A sequence, {sn }, is a function whose domain is the set N of natural numbers. If s is a sequence,
denote its value at n by sn .
Definition 1.3.5 (Convergence). A sequence {sn } is said to converge to the real number s provided
that for each ε > 0, there exists N ∈ R such that for all n ∈ N, n > N implies that |sn − s| < ε. If
{sn } converges to s, then s is called the limit of the sequence {sn }, and we write limn→∞ sn = s
or simply sn → s. If a sequence does not converge to a real number, it is said to diverge.
Example 1.3.10. Prove that lim 1/n = 0.
Solution: Given ε > 0, let N = 1/ε. Then for any n > N , |1/n − 0| = 1/n < 1/N = ε.
Example 1.3.11. Prove that lim(n2 + 2n)/(n3 − 5) = 0
Solution: Given ε > 0, let N = max{3, 4/ε}. Then n > N implies that n > 3 and n > 4/ε.
Since n > 3, we have n2 + 2n < 2n2 and n3 − 5 > n3 /2. Thus for n > N we have
2
n2 + 2n
n + 2n
2n2
4
4
=
−
0
<
<
= ε.
=
n3 − 5
1
3
3
n −5
n
N
2n
Definition 1.3.6 (Bounded Sequence). A sequence {sn } is said to be bounded if the range {sn :
n ∈ N} is a bounded set. That is if there exists an M ≥ 0 such that |sn | ≤ M for all n ∈ N.
Theorem 1.3.4. Every convergent sequence in Rn is bounded.
Proof. Let sn be a convergent sequence and let sn → s. From the definition of convergence with
ε = 1, we obtain N ∈ R such that |sn − s| < 1 whenever n > N . Thus for n > N the triangle
inequality implies that |sn | < |s| + 1. We know that sn is bounded if the range {sn : n ∈ N} is a
bounded set, that is, if there exists an M ≥ 0 such that |sn | ≤ M for all n ∈ N. Thus, let
M = max{|s1 |, . . . , |sN |, |s| + 1},
(so that M could either be |s| + 1 or the largest absolute value among the first N terms) then we
have |sn | ≤ M for all n ∈ N, so sn is bounded.
Theorem 1.3.5. Suppose that {sn } and {tn } are convergent sequences with lim sn = s and
lim tn = t. Then
(a) lim(sn + tn ) = s + t
(b) lim(ksn ) = ks and lim(k + sn ) = k + s for any k ∈ R
(c) lim(sn tn ) = st
(d) lim(sn /tn ) = s/t, provided that tn 6= 0 for all n and t 6= 0
Proof.
(a) To show that sn + tn → s + t, we need to make the difference |(sn + tn ) − (s + t)| small.
Using the triangle inequality, we have
|(sn + tn ) − (s + t)| = |(sn − s) + (tn − t)| ≤ |sn − s| + |tn − t|.
30
A. W. Richter
1.3. BASIC ANALYSIS
Now, given any ε > 0, since sn → s, there exists N1 such that n > N1 implies that |sn − s| <
ε
ε
2 . Similarly, since tn → t, there exists N2 such that n > N2 implies that |tn − t| < 2 . Thus
if we let N = max{N1 , N2 }, then n > N implies that
|(sn − s) + (tn − t)| ≤ |sn − s| + |tn − t| <
ε ε
+ = ε.
2 2
(b) To show that ksn → ks for any k ∈ R, we need to make the difference |ksn − ks| small. We
have
|ksn − ks| = |k(sn − s)|
= |k||sn − s|.
Now, given any ε > 0, since sn → s, there exists N such that n > N implies that |sn − s| <
ε/|k|. Thus for n > N
|ksn − ks| = |k||sn − s| < |k|
ε
= ε.
|k|
To show that k + sn → k + s, note that
|(k + sn ) − (k + s)| = |sn − s|.
Thus, for n > N , k + sn → k + s.
(c) Using the triangle inequality
|sn tn − st| = |(sn tn − sn t) + (sn t − st)|
≤ |sn tn − sn t| + |sn t − st|
= |sn ||tn − t| + |t||sn − s|.
We know that every convergent sequence is bounded (Theorem 1.3.4). Thus, sn is bounded,
and there exists M1 > 0 such that |sn | ≤ M1 for all n. Letting M = max{M1 , |t|}, we
obtain the inequality
|sn tn − st| ≤ M |tn − t| + M |sn − s|.
ε
Now, given ant ε > 0, there exists N1 and N2 such that |tn − t| < 2M
when n > N1 and
ε
|sn − s| < 2M when n > N2 . Let K = max{N1 , N2 }. Then n > N implies that
|sn tn − st| ≤ M |tn − t| + M |sn − s|
ε ε ε
ε +M
= + = ε.
<M
2M
2M
2 2
(d) Since sn /tn = sn (1/tn ), it suffices from part (c) to show that lim 1/tn = 1/t. That is, given
ε > 0, we must show that
1
− 1 = t − tn < ε
tn
t tn t for all n sufficiently large. To get a lower bound on how small the denominator can be, we
note that since t 6= 0, there exists an N1 such that n > N1 implies that |tn − t| < |t|/2. Thus
for n > N1 we have
|tn | = |t − (t − tn )| ≥ |t| − |t − tn | > |t| −
31
|t|
|t|
=
2
2
A. W. Richter
1.3. BASIC ANALYSIS
by the reverse triangle inequality. There also exists an N2 such that n > N2 implies that
|tn − t| < 21 ε|t|2 . Let N = max{N1 , N2 }. Then n > N implies that
1
− 1 = t − tn < 2 |tn − t| < ε.
tn
t
tn t |t|2
Hence lim(1/tn ) = 1/t.
Theorem 1.3.6. A set S ⊂ Rn is closed if and only if every convergent sequence in S converges to
a point in S.
Proof. Suppose that S is closed and that xn is a sequence in S converging to x. Suppose that x 6∈ S.
BecauseT
S is closed, Rn \S is open, so that for some ε > 0, there exists N (x; ε) ⊂ Rn \S. That is
N (x; ε) S = ∅. Since lim xn = x, there is an N such that xn ∈ N (x; ε), if n > N . But then
xn 6∈ S, whenever n > N , which is impossible since xn is a sequence in S. Thus it must be the
case that if a set S is closed, then every convergent sequence in S must converge to a point in S.
Suppose that STis not closed. Then Rn \S is not open, implying that there exists an x ∈ Rn \S
such that N (x; ε) S 6= ∅, for every ε > 0. Practically, this means that at least part of the
neighborhood of x lies in S. In particular, for every positive integer n, there exists an xn ∈ S, such
that |xn − x| < 1/n. This gives us a sequence {xn } in S converging to a point x not in S. Thus,
by the contra-positive argument (not closed implies there exists a convergent sequence in S that
converges to a point not in S), we have that if every convergent sequence in S converges to a point
in S, then S must be closed.
1.3.4 Compactness
Theorem 1.3.7 (Heine-Borel). A subset S of Rn is compact iff S is closed and bounded.
Example 1.3.12. Let S ⊂ R2 be defined as follows:
S = {(x, y) ∈ R2 : y = sin(1/x), x > 0 ∪ {(0, 0)}}
Is S closed? Open? Bounded? Compact?
Solution: None. Let sk = (xk , yk ). S is bounded if there exists an ε > 0 such that |si − sj | ≤ ε
for all sk ∈ S. Since x ∈ [0, +∞), there does not exist a neighborhood of radius ε that contains
S. Thus, S is not bounded, and hence not compact (Heine-Borel Theorem). In addition, S is not
closed. To see this, recall that a set is closed if every convergent sequence in the set has its limit in
the set. That is, a closed set is a set that is closed under limit operations. Now, consider the sequence
2
{(xk , yk )}k≥1 ∈ S, given by (xk , yk ) = ( π(4k+1)
, 1), for k ≥ 1. Then we have (xk , yk ) ∈ S since
sin(1/xk ) = sin[(4k + 1)(π/2)] = 1 = yk . We can now see that (xk , yk ) → (0, 1) as k → ∞.
But (0, 1) 6∈ S, and as a consequence, S is not closed. Also, since any ε-neighborhood around any
point in S is not contained in S, there are no interior points (int S = ∅). Thus, S is not open.
32
Chapter 2
Linear Algebra
In this chapter, we will look at linear algebra more from a transformational viewpoint than system
of equations viewpoint (i.e. function from one vector space to another).
2.1 Vector Spaces and Subspaces
Definition 2.1.1 (Vector Space). A vector space (or a linear space) V over a field F consists of a
set on which two operations (called addition and scalar multiplication) are defined so that for each
pair of elements x, y in V there is a unique element x + y in V , and for each element a in F and
each element x in V there is a unique element ax in V , such that the following conditions hold:
(VS 1) For all x, y in V , x + y = y + z (commutativity of addition).
(VS 2) For all x, y, z in V , (x + y) + z = x + (y + z) (associativity of addition).
(VS 3) There exists an element in V denoted by 0 such that x + 0 = x for each x in V .
(VS 4) For each element x in V there exists an element y in V such that x + y = 0.
(VS 5) For each element x in V , 1x = x.
(VS 6) For each pair of elements a, b in F and each element x in V , (ab)x = a(bx).
(VS 7) For each element a in F and each pair of elements x, y in V , a(x + y) = ax + ay.
(VS 8) For each pair of elements a, b in F and each element x in V , (a + b)x = ax + bx
The elements of the field F are called scalars and the elements of the vector space V are called
vectors. The following are examples of vector spaces:
1. An object of the form (a1 , a2 , . . . , an ), where the entries ai , i = {1, . . . , n} are elements
from a field F , is called an n-tuple with entries from F . The set of all n-tuples with entries
from F is a vector space, denoted F n , under the operations of coordinate-wise addition and
scalar multiplication; that is, if u = (a1 , a2 , . . . , an ) ∈ F n , v = (b1 , b2 , . . . , bn ) ∈ F n and
c ∈ F , then
u + v = (a1 + b1 , a2 + b2 , . . . , an + bn )
33
and
cu = (ca1 , ca2 , . . . , can ).
A. W. Richter
2.1. VECTOR SPACES AND SUBSPACES
2. The set of all m×n matrices with entries from a field F is a vector space, denoted Mm×n (F ),
under the following operations of addition and scalar multiplication: For A, B ∈ Mm×n (F )
and c ∈ F
(A + B)ij = Aij + Bij
(cA)ij = cAij
and
for 1 ≤ i ≤ m and 1 ≤ j ≤ n.
3. Let me any nonempty set and F be any field, and let F(S, F ) denote the set of functions
from S to F . The set F(S, F ) is a vector space under the operations of addition and scalar
multiplication defined for f, g ∈ F(S, F ) and c ∈ F by
(f + g)(s) = f (s) + g(s)
and
(cf )(s) = c[f (s)]
for each s ∈ S. Note that these are the familiar operations of addition and scalar multiplication for the functions used in algebra and calculus.
4. Let S = {(a1 , a2 ) : a1 , a2 ∈ R}. For (a1 , a2 ), (b1 , b2 ) ∈ S and c ∈ R, define
(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 − b2 ) and
c(a1 , a2 ) = (ca1 , ca2 ).
Since (VS 1), (VS 2), and (VS 8) fail to hold, S is not a vector space under these operations.
Theorem 2.1.1 (Cancelation Law for Vector Addition). If x, y, and z are elements of a vector space
V such that x + z = y + z, then x = y.
Proof. There exists an element v in V such that z + v = 0 by (VS 4). Thus,
x = x + 0 = x + (z + v) = (x + z) + v
= (y + z) + v = y + (z + v)
=y+0=y
by (VS 2) and (VS 3).
Corollary 2.1.1. V has exactly one zero vector
Proof. By (VS 3), there exists 0 ∈ V such that x + 0 = x, for all x ∈ V . Now suppose that
x + z = x, for all x ∈ V . Then x + 0 = x + z, which implies 0 = z by Theorem 2.1.1.
Corollary 2.1.2. For each x ∈ V , there is exactly one vector y so that x + y = 0.
Proof. Suppose x + z = 0 = x + y. Then z = y by Theorem 2.1.1. This means there is only one
additive inverse to x. Thus, we say y = −x.
Theorem 2.1.2. In any vector space V the following statements are true:
(a) 0x = 0 for each x ∈ V .
(b) (−a)x = −(ax) = a(−x) for each a ∈ F and each x ∈ V .
(c) az = z for each a ∈ F .
Theorem 2.1.3 (Subspace). Let V be a vector space and W a subset of V . Then W is a subspace
of V if and only if the following three conditions hold for the operations defined in V :
34
A. W. Richter
2.2. LINEAR COMBINATIONS AND SPANNING CONDITIONS
(a) 0 ∈ W .
(b) x + y ∈ W whenever x ∈ W and y ∈ W .
(c) cx ∈ W whenever c ∈ F and c ∈ W .
This theorem provides a simple method for determining whether or not a given subset of a vector
space is a subspace. The following are examples of subspaces:
1. An n × n matrix M is called a diagonal matrix if Mij = 0 whenever i 6= j. Define
V = Mn×n (R) and W = {A ∈ Mn×n : Aij = 0 whenever i 6= j}. Clearly, the zero matrix
is a diagonal matrix because all of its entries are 0 and hence belongs to W . Moreover, if A
and B are diagonal n × n matrices, then whenever i 6= j
(A + B)ij = Aij + Bij = 0 + 0 = 0 and
(cA)ij = cAij = c0 = 0
for any scalar c. Hence A + B and cA are diagonal matrices for any scalar c and also belong
to W . Therefore the set of diagonal matrices, W , is a subspace of V = Mn×n (F ).
2. The transpose At of an m×n matrix A is the n×m matrix obtained from A by interchanging
the rows with the columns; that is, (At )ij = Aji . A symmetric matrix is a matrix A such
that At = A. Define V = Mn×n (R) and W = {A ∈ Mn×n : At = A}. The zero matrix is
equal to its transpose and hence belongs to W . Let A, B ∈ W and c ∈ R, then
(cA + B)t = (cA)t + B t = cAt + B t = cA + B.
Thus, cA + B ∈ W and W is a subspace of V .
3. Let A ∈ Mn×n (R). Define W = {x ∈ Mn×1 (R) : Ax = 0} ⊆ Mn×1 (R) = V . Since
A0 = 0, 0 ∈ W . Also, for any x1 , x2 and c ∈ R
A(cx1 + x2 ) = cAx1 + Ax2 = 0.
Thus, cx1 + x2 ∈ W and W is a subspace of V . This W is referred to as the null space of A.
4. The set of matrices in Mn×n (R) having nonnegative entries is not a subspace of Mn×n (R)
because it is not closed under scalar multiplication.
Theorem 2.1.4. Any intersection of subspaces of a vector space V is a subspace of V
T
Proof. Let C be a collection of subspaces of V , and let W = U denote the intersection of the
subspaces in C. Since every subspace contains the zero vector, 0 ∈ W . Let a ∈ F and x, y ∈ W .
Then x, y ∈ U for all U ∈ C. Thus, cx + y ∈ U for all U ∈ C. Hence, cx + y ∈ W and W is a
subspace of V .
2.2 Linear Combinations and Spanning Conditions
Definition 2.2.1 (Linear Combination). Let V be a vector space and S a nonempty subset of V .
A vector v ∈ V is called a linear combination of elements of S if there exist a finite number of
elements u1 , u2 , . . . , un in S and scalars a1 , a2 , . . . , an in F such that v = a1 u1 +a2 u2 +· · ·+an un .
35
A. W. Richter
2.2. LINEAR COMBINATIONS AND SPANNING CONDITIONS
Example 2.2.1. Claim: 2x3 − 2x2 + 12x − 6 is a linear combination of
x3 − 2x2 − 5x − 3
and
3x3 − 5x2 − 4x − 9
in P3 (R). To show this, find scalars a and b such that
2x3 − 2x2 + 12x − 6 = a(x3 − 2x2 − 5x − 3) + b(3x3 − 5x2 − 4x − 9)
= (a + 3b)x3 + (−2a − 5b)x2 + (−5a − 4b)x + (3a − 9b).
Equating coefficients



1
3
2
1
−2 −5 −2 rref 0



−5 −4 12  ∼ 0
−3 −9 −6
0

0 −4
1 2

0 0
0 0
Thus a = −4 and b = 2, which proves that 2x3 − 2x2 + 12x − 6 is a linear combination of
x3 − 2x2 − 5x − 3 and 3x3 − 5x2 − 4x − 9.
Definition 2.2.2 (Span). Let S be a nonempty subset of a vector space V . The span of S, denoted
span(S), is the set consisting of all linear combinations of the elements of S. For convenience we
define span(∅) = {0}.
In R3 , for instance, the span of the set {(1, 0, 0), (0, 1, 0)} consists of all vectors in R3 that
have the form a(1, 0, 0) + b(0, 1, 0) = (a, b, 0) for some scalars a and b. Thus the span of
{(1, 0, 0), (0, 1, 0)} contains all the points in the xy-plane. In this case, the span of the set is a
subspace of R3 . This fact is true in general.
Theorem 2.2.1. The span of any subset S of a vector space V is a subspace of V . Moreover, any
subspace of V that contains S must also contain the span of S.
Proof. This result is immediate if S = ∅ because span(∅) = {0}, which is a subspace that is
contained in any subspace of V .
If S 6= ∅, then S contains an element z and 0z = 0 is an element of span(S). Let x, y ∈
span(S). Then there exists elements u1 , u2 , . . . , um , v1 , v2 . . . , vn in S and scalars a1 , a2 , . . . , am
and b1 , b2 , . . . , bn such that
x = a1 u1 + a2 u2 + · · · + am um
and
y = b1 v1 + b2 v2 + · · · + bm vm .
Then
x + y = a1 u1 + a2 u2 + · · · + am um + b1 v1 + b2 v2 + · · · + bm vm
and, for any scalar c,
cx = (ca1 )u1 + (ca2 )u2 + · · · + (cam )um
are clearly linear combinations of the elements of S; so x + y and cx are elements of span(S).
Thus, span(S) is a subspace of V .
Now let W denote any subspace of V that contains S. If w ∈ span(S), then w has the form w =
c1 w1 +c2 w2 +· · ·+ck wk for some elements w1 , w2 , . . . , wk ∈ W and some scalars c1 , c2 , . . . , ck .
Since S ⊆ W , we have w1 , w2 , . . . , wk ∈ W in W . Therefore w = c1 w1 + c2 w2 + · · · + ck wk is
an element of W (since W is a subspace of V , it is closed under addition and scalar multiplication).
Since w, an arbitrary element of span(S), belongs to W , it follows that span(S) ⊆ W .
36
A. W. Richter
2.2. LINEAR COMBINATIONS AND SPANNING CONDITIONS
Example 2.2.2. Suppose V = R3 and S = {(0, −2, 2), (1, 3, −1)}. Is (3, 1, 5) ∈ span(S). To
answer this question, try to find constants a and b so that
a(0, −2, 2) + b(1, 3, −1) = (3, 1, 5).
Equating coefficients




0
1 3
1 0 4
−2 3 1 rref
∼ 0 1 3
2 −1 5
0 0 0
Thus a = 4 and b = 3, which proves that (3, 1, 5) ∈ span(S).
Definition 2.2.3. A subset S of a vector space V generates (or spans) V if span(S) = V . In this
situation we may also say that the elements of S generate (or span) V .
Example 2.2.3.
1. Let V = Rn and S = {e1 , e2 , . . . , en }, where ej denotes a vector whose jth coordinate is 1
and whose other coordinates are 0. Since x = (x1 , x2 , . . . , xn ) ∈ Rn can be written
x = x1 (1, 0, 0, . . . , 0) + x2 (0, 1, 0, . . . , 0) + · · · + xn (0, 0, . . . , 0, 1)
n
X
xk ek ∈ span(S).
=
k=1
Hence S = Rn and S generates Rn .
2. Let V = Pn (R) = {a0 + a1 x + · · · + an xn : each ak ∈ R} and S = {1, x, x2 , . . . , xn }.
Clearly span(S) ∈ Pn (R). Also, for any p(x) ∈ Pn (R)
p(x) = a0 + a1 x + · · · + an xn
= a0 (1) + a1 (x) + · · · + an (xn ) ∈ span(S).
Hence S = Pn (R) and S generates Pn (R).
3. Let V = M2×2 (R) and
S=
1 0
0 1
0 0
0 0
,
,
,
.
0 0
0 0
1 0
0 1
Since
0 0
0 0
0 1
a11 a12
1 0
∈ span(S)
+ a22
+ a21
+ a12
= a11
A=
0 1
1 0
0 0
0 0
a21 a22
and A ∈ M2×2 (R), S generates M2×2 (R).
4. Let V = R3 and S = {(1, 1, 0), (1, 0, 1), (0, 1, 1)}. Let x ∈ span(S). Then for a, b, c ∈ R
x = a(1, 1, 0) + b(1, 0, 1) + c(0, 1, 1) = (a + b, a + c, b + c) ∈ R3 .
Thus, span(S) ⊆ R3 . Now let x = (x1 , x2 , x3 ) ∈ R3 , then we must find a, b, c ∈ R
satisfying
a(1, 1, 0) + b(1, 0, 1) + c(0, 1, 1) = (x1 , x2 , x3 ).
37
A. W. Richter
2.3. LINEAR INDEPENDENCE AND LINEAR DEPENDENCE
Equating coefficients




1 1 0 x1
1 0 0 (x1 + x2 − x3 )/2
1 0 1 x2  rref
∼ 0 1 0 (x1 − x2 + x3 )/2 ,
0 1 1 x3
0 0 1 (x2 + x1 + x3 )/2
which is consistent. Hence x ∈ span(S), R3 ⊆ span(S), and S generates R3 .
2.3 Linear Independence and Linear Dependence
Definition 2.3.1 (Linear Dependence). A subset S of a vector space V is called linearly dependent
if there exist a finite number of distinct vectors u1 , u2 , . . . , un in S and scalars a1 , a2 , . . . , an , not
all zero, such that
a1 u1 + a2 u2 + · · · + an un = 0.
In this case we say that the elements of S are linearly dependent.
For any vectors u1 , u2 , . . . , un we have a1 u1 + a2 u2 + · · · + an un = 0 if a1 = a2 = · · · =
an = 0. We call this the trivial representation of 0 as a linear combination of u1 , u2 , . . . , un .
Thus for a set to be linearly dependent means that there is a nontrivial representation of 0 as a
linear combination of vectors in the set. Consequently, any subset of a vector space that contains
the zero vector is linearly dependent, because 0 = 1 · 0 is a nontrivial representation of 0 as a linear
combination of vectors in the set.
Definition 2.3.2 (Linear Independence). A subset S of a vector space that is not linearly dependent
is called linearly independent. As before we also say that the elements of S are linearly independent.
The following facts about linearly independent sets are true in any vector space.
1. The empty set is linearly independent, for linearly dependent sets must be nonempty.
2. A set consisting of a single nonzero vector is linearly independent. For if {u} is linearly
dependent, the au = 0 for some nonzero scalar a. Thus
u = a−1 (au) = a−1 0 = 0.
3. A set is linearly independent if and only if the only representations of 0 as linear combinations
of its elements are trivial representations.
The condition in 3 provides a useful method for determining if a finite set is linearly independent.
This technique is illustrated in the following example.
Example 2.3.1. Determine whether the following sets are linearly dependent or linearly independent.
1. In P2 (R), let S = {3 + x + x2 , 2 − x + 5x2 , 4 − 3x2 }. Consider the equation:
a(3 + x + x2 ) + b(2 − x + 5x2 ) + c(4 − 3x2 ) = (3a + 2b + 4c) + (a − b)x + (a + 5b − 3c)x2 = 0
38
A. W. Richter
2.4. BASES AND DIMENSION
for all x. Equating coefficients




3 2
4
1 0 0
1 −1 0  rref
∼ 0 1 0 .
1 5 −3
0 0 1
Thus, the only solution is a = b = c = 0, which implies that S is linearly independent in
P2 (R).
2. In M2×2 (R), let
S=
1 −3
−2 6
,
.
−2 4
4 8
Consider the equation:
1 −3
−2 6
a1 − 2a2
−3a1 + 6a2
0 0
a
+b
=
=
.
−2 4
4 8
−2a1 + 4a2 4a1 − 8a2
0 0
Equating coefficients




1 −2
1 −2
−3 6  rref 0 0 




−2 4  ∼ 0 0  ,
4 −8
0 0
which implies that a = 2b. Therefore, there are infinitely many nontrivial solutions and so S
is linearly dependent in M2×2 (R).
2.4 Bases and Dimension
Definition 2.4.1 (Basis). A basis β for a vector space V is a linearly independent subset of V that
generates V (i.e. β is linearly independent and span(β) = V ).
Theorem 2.4.1. let V be a vector space and β = {u1 , u2 , . . . , un } be a subset of V . Then β is a
basis for V if and only if each vector v can be uniquely expressed as a linear combination of vectors
in β, that is, can be expressed in the form
v = a1 u1 + a2 u2 + · · · + an un
for unique scalars a1 , a2 , . . . , an .
Proof. First let β be a basis for V . If v ∈ V , then v ∈ span(β) because span(β) = V . Thus v is a
linear combination of the elements in β. Suppose that
v = a1 u1 + a2 u2 + · · · + an un
and
v = b1 u1 + b2 u2 + · · · + bn un
are two such representations of v. Subtracting the second equation from the first gives
(a1 − b1 )u1 + (a2 − b2 )u2 + · · · + (an − bn )un .
39
A. W. Richter
2.4. BASES AND DIMENSION
Since β is linearly independent, it follows that a1 − b1 = a2 − b2 = · · · = an − bn = 0. Hence
a1 = b1 , a2 = b2 , . . . , an = bn and so v is uniquely expressible as a linear combination of the
elements in β.
Now let every v ∈ V can be uniquely expressed as a linear combination of vectors in β. Note
first that β is trivially a generating set for V , since we are told each v ∈ V is also in span(β). Since
v = a1 u1 + a2 u2 + · · · + an un = b1 u1 + b2 u2 + · · · + bn un , ai = bi for each i such that 1 ≤ i ≤ n,
otherwise each v ∈ V would not be uniquely expressible as a linear combination of vectors from
β. This implies that ai − bi = 0 for each i, proving that β is linearly independent. β is therefore a
basis for V .
Example 2.4.1.
1. Since span(∅) = {0} and ∅ is linearly independent, ∅ is a basis for the vector space {0}.
2. In F n , {e1 , e2 , . . . , en }, where ej denotes a vector whose jth coordinate is 1 and whose other
coordinates are 0 is a basis for F n and is called the standard basis.
3. In Mm×n (F ), let M ij denote the matrix whose only nonzero entry is a 1 in the ith row and
jth column. Then {M ij : 1 ≤ i ≤, 1 ≤ j ≤ n} is a basis for Mm×n (F ).
4. In Pn (F ) the set {1, x, x2 , . . . , xn } is a basis. We call this basis the standard basis for
Pn (F ).
5. In P (F ) the set {1, x, x2 , . . .} is a basis.
Theorem 2.4.2 (Extraction). If a vector space V is generated by a finite set S, then some subset of
S is a basis for V . Hence V has a finite basis.
Proof. If S = ∅ or S = {0}, then V = {0} and ∅ is a subset of S that is a basis for V . Otherwise
suppose there exists u1 ∈ S and u1 6= 0. Define S1 = {u1 }. Then S1 ⊆ S and S1 is independent.
If span(S1 ) = V , we are done. If not, then there exists a u2 ∈ S where u2 6∈ span(S1 ). Define
S2 = S1 ∪ {u2 }. Then S2 is independent. If span(S2 ) = V , we are done. If not,. . ., then there
a um ∈ S where um 6∈ span(Sm−1 ). Define Sm = Sm−1 ∪ {um }. Then Sm is independent. If
k > m and uk ∈ S, then u ∈ span(Sm ). Thus span(Sm ) = V and Sm ⊆ S is a basis for V .
Example 2.4.2. Find a basis for the following sets:
1. Let V = R3 and S = {(2, −1, 4), (1, −1, 3), (1, 1, −1), (1, −2, 1)}. Let x = (x1 , x2 , x3 ) ∈
R3 , then we must find a, b, c, d ∈ R satisfying
a(2, −1, 4) + b(1, −1, 3) + c(1, 1, −1) + d(1, −1, 1) = (x1 , x2 , x3 ).
Equating coefficients




2
1
1
1
1 0 2 0
−1 −1 1 −2 rref
∼ 0 1 −3 0 ,
4
3 −1 1
0 0 0 1
which is consistent for all x. Hence x ∈ span(S) and S generates R3 . By Theorem 2.4.2,
S \ {(1, 1, −1)} is a basis for R3 .
40
A. W. Richter
2.4. BASES AND DIMENSION
2. Let V = M2×2 (R) and
S=
Form the coefficient matrix

1
2

2
1
1 2
2 1
1 5
2 3
1 2
,
,
,
,
.
2 1
1 2
5 1
3 1
3 2
2
1
1
2
1
5
5
1
2
3
3
1


1
1


2 rref 0
∼ 
3
0
2
0

0 3 0 0
1 −1 0 0
,
0 0 1 0
0 0 0 1
which is consistent. Hence span(S) = M2×2 (R). By Theorem 2.4.2, S \ {S3 } is a basis for
M2×2 (R).
3. Let V = P2 (R) and S = {1 + x, 1 − x, 1 + x + x2 , 1 − x + x2 , x + x2 }. Form the coefficient
matrix




1 1 1 1 0
1 0 0 −1 −1/2
1 −1 1 −1 1 rref
∼ 0 1 0 1 −1/2 ,
0 0 1 1 1
0 0 1 1
1
which is consistent. Hence span(S) = P2 (R). By Theorem 2.4.2, S \ {1 − x + x2 , x + x2 }
is a basis for P2 (R).
Theorem 2.4.3. Let β = {v1 , v2 , . . . , vn } be a basis for V and suppose S = {u1 , u2 , . . . , um } ⊆
V . If m = |S| > n = |β|, then S is dependent.1 .
Proof. For each uj , 1 ≤ j ≤ m, there exist unique scalars aij , 1 ≤ i ≤ n so that
uj = a1j v1 + a2j v2 + · · · + anj vn =
If we form
Pm
j=1 xj uj
aij vi
i=1
= 0, then
m
X
j=1
Since β is independent
n
X
P
xj
n
X
i=1
aij vi
!
=
n
X
i=1


m
X

aij xj  vi = 0
j=1
= 1m aij xj = 0 for each 1 ≤ i ≤ n. Set

 

x1
a11 a12 · · · a1m

 .. 
 ..
.
.
.
.
.
.
A= .
.
.
.  X =  . .
xm
an1 an2 · · · anm
j
Then AX = 0. Since m > n, we will always have at least one free variable and so some xj 6= 0.
Thus S is dependent.
Definition 2.4.2. A vector space is called finite-dimensional if it has a basis consisting of a finite
number of elements. The unique number of elements in each basis for V is called the dimension of
V and is denoted dim(V ). A vector space that is not finite-dimensional is call infinite-dimensional.
1
For any set S, |S| corresponds to the number of vectors in S.
41
A. W. Richter
2.4. BASES AND DIMENSION
Example 2.4.3.
1. dim(F n ) = n
2. dim(Pn ) = n + 1 (note the constant term)
3. dim(Mm×n ) = mn
4. dim({0}) = 0
5. dim(P ) = ∞
Corollary 2.4.1. Let V be a vector space, dim(V ) = n < ∞, and S ⊆ V . If |S| < n, then
span(S) 6= V .
Proof. Suppose span(S) = V . and |s| < n. By Theorem 2.4.2, there exists β ⊆ S so that β is a
basis for V . Then n = |β| ≤ |s| < n, which is a contradiction.
Remark 2.4.1. Let V be a vector space, dim(V ) = n < ∞, and S ⊆ V . Then the these results
directly follow from earlier results:
1. If S is independent, then |S| ≤ dim(V ) (contrapositive of Theorem 2.4.3).
2. If S generates V (span(S) = V ), then |S| ≥ dim(V ) (contrapositive of Corollary 2.4.1).
Corollary 2.4.2. Let V be a vector space, dim(V ) = n < ∞, S ⊆ V , and |S| = dim(V ). Then
1. If span(S) = V , then S is a basis for V .
2. If S is linearly independent, then S is a basis for V .
Proof.
1. Since span(S) = V , by Theorem 2.4.2 there exists β ⊆ S so β is a basis for V . Thus,
|β| = dim(V ) = |S| and so β = S.
2. On the contrary, suppose span(S) 6= V . Then there exists v ∈ V so v 6∈ span(S). Hence
S ∪ {v} is independent. This, however is a contradiction to Theorem 2.4.3, which says that
if |S ∪ {v}| > dim(V ), then S ∪ {v} is dependent. Therefore span(S) = V and so S is a
basis.
Example 2.4.4. Do the polynomials x3 2x2 + 1, 4x2 x + 3, and 3x2 generate P3 (R)? No, since
|S| < dim(R3 ), span(S) 6= P3 (Corollary 2.4.1).
Example 2.4.5. Determine whether S = {(1, 0, −1), (2, 5, 1), (0, −4, 3)} form a basis for R3 .
Since |S| = 3 = dim(R3 ), it is sufficient to show that S is linearly independent (Corollary 2.4.2).
Equating coefficients




1 2 0
1 0 0
 0 5 −4 rref
∼ 0 1 0 .
−1 1 3
0 0 1
Hence S is linearly independent and forms a basis for R3 .
42
A. W. Richter
2.5. LINEAR TRANSFORMATIONS
Example 2.4.6. Find a basis for the following subspaces of F 5 :
W = {(a1 , a2 , a3 , a4 , a5 ) ∈ F 5 : a1 − a3 − a4 = 0}.
What is the dimensions of W ? (a1 , a2 , a3 , a4 , a5 ) ∈ W1 if and only if (a1 , a2 , a3 , a4 , a5 ) =
(s, t, r, t, 2s) for some r, s, t ∈ R. Thus, a spanning set for W1 is
β = {(0, 0, 1, 0, 0), (1, 0, 0, 0, 2), (0, 1, 0, 1, 0)}.
Since

0
0

1

0
0
1
0
0
0
2


1
0
0
1
 rref 

0
 ∼ 0

0
1
0
0
0
1
0
0
0

0
0

1
,
0
0
the spanning set, β, is linearly independent and thus forms a basis for W . The dim(W ) = 4.
2.5 Linear Transformations
In the previous sections, we developed the theory of abstract vector spaces in detail. It is now natural
to consider those functions defined on vector spaces that in some sense “preserve” the structure.
These special functions are called linear transformations.
Definition 2.5.1 (Linear Transformation). Let V and W be vector spaces over F . We call a function
T : V → W a Linear Transformation from V into W if for all x, y ∈ V and c ∈ F we have
(a) T (x + y) = T (x) + T (y)
(b) T (cx) = cT (x)
Theorem 2.5.1. The following are basic facts about the function T : V → W :
1. If T is linear, then T (0) = 0.
2. T is linear if and only if T (x − y) = T (x) − T (y).
3. T is linear if and only if T (cx + y) = cT (x) + T (y) for all x, y ∈ V and c ∈ F .
4. T is linear if and only if for x1 , . . . , xn ∈ V and a1 . . . , an ∈ F we have
!
n
n
X
X
ai T (xi )
ai xi =
T
i=1
i=1
Proof. Let T : V → W be linear. Clearly T (0) = 0, for otherwise by linearity T (0) = T (x +
(x)) = T (x) + T (x) = T (x) + (T (x)) 6= 0, which is absurd. Also note that since T is linear,
T (cx + y) = T (cx) + T (y) = cT (x) + T (y), and T (x − y) = T (x + (−y)) = T (x) + T (−y) =
T (x)T (y). To prove property 4, note that if T is linear, an inductive argument can be used to show
that for all x1 , . . . , xn ∈ V and a1 , . . . , an ∈ F , we have
!
n
n
n
X
X
X
ai T (xi )
T (ai xi ) =
ai x i =
T
i=1
i=1
i=1
43
A. W. Richter
2.5. LINEAR TRANSFORMATIONS
Now, assume T (cx + y) = cT (x) + T (y) for all x, y ∈ V and c ∈ F . Let x, y ∈ V . Then we
obtain T (x + y) = T (1x + y) = 1 · T (x) + T (y) = T (x) + T (y). Next, let x ∈ V and c ∈ F .
Then we obtain T (cx) = T (cx + 0) = c · T (x) + T (0) = c · T (x). This proves T is linear. The
same type of reasoning can be used to show that if T satisfies
!
n
n
X
X
ai T (xi ),
ai xi =
T
i=1
i=1
then T must be linear.
Example 2.5.1. Given A ∈ Mm×n (F ), define LA : Mn×1 → Mm×1 (F ) by LA (x) = Ax. To see
that this is a linear transformation, note that for any x, y ∈ Mn×1 and any c ∈ R
LA (cx + y) = A(cx + y) = c(Ax) + Ay = cLA (x) + LA (y).
Example 2.5.2. Define T : R2 → R2 by T (a1 , a2 ) = (2a1 + a2 , a1 ). To show that T is linear, let
c ∈ R and x, y ∈ R2 , where x = (b1 , b2 ) and y = (d1 , d2 ). Since
cx + y = (cb1 + d1 , cb2 + d2 ),
we have
T (cx + y) = (2(cb1 + d1 ) + cb2 + d2 , cb1 + d1 ).
Also
cT (x) + T (y) = c(2b1 + b2 , b1 ) + (2d1 + d2 , d1 )
= (2cb1 + cb2 + 2d1 + d2 , cb1 + d1 )
= (2(cb1 + d1 ) + cb2 + d2 , cb1 + d1 ).
Thus T is linear.
Example 2.5.3. Define T : Pn (R) → Pn−1 (R) by T (f ) = f ′ , where f ′ denotes the derivative of
f . To show that T is linear, let g and h be vectors in Pn (R) and a ∈ R. Then
T (ag + h) = (ag + h)′ = ag ′ + h′ = aT (g) + T (h).
Thus T is linear.
We now turn our attention to two very important sets associated with linear transformations:
the range and null space. The determination of these sets allows us to examine more closely the
intrinsic properties of linear transformations.
Definition 2.5.2 (Null Space and Range). Let V and W be vector spaces and let T : V → W be
linear. We define the null space (or kernel) N (T ) of T to be the set of all vectors x in V such that
T (x) = 0; that is, N (T ) = {x ∈ V : T (x = 0)}.
We define the range (or image) R(T ) of T to be the subset of W consisting of all images (under
T ) of elements of V ; that is R(T ) = {T (x) : x ∈ V }.
Definition 2.5.3 (Nullity and Rank). Let V and W be vector spaces and let T : V → W be linear.
If N (T ) and R(T ) are finite-dimensional, then we define the nullity of T , denoted nullity(T ), and
the rank of T , denoted rank(T ), to be the dimensions of N (T ) and R(T ), respectively.
44
A. W. Richter
2.5. LINEAR TRANSFORMATIONS
As a illustration of these definitions, consider the following figure:
T
W
V
R(T)
0V
0W
N(T)
The next theorem provides a method for finding a spanning set for the range of a linear transformation. With this accomplished, a basis for the range is easy to discover.
Theorem 2.5.2. Let V and W be vector spaces, and let T : V − → W be linear. If β =
{v1 , . . . , vn } is a finite basis for V , then
R(T ) = span({T (v1 ), . . . , T (vn )}).
Proof. Clearly T (vi ) ∈ R(T ) for each i. Because R(T ) is a subspace of V that contains the set
{T (v1 ), . . . , T (vn )}, by Theorem 2.2.1 R(T ) contains span({T (v1 ), . . . , T (vn )}). Now suppose
that w ∈ R(T ). Then w = T (v) for some v ∈ V . Because β is a basis for V , we have
v=
n
X
ai vi
i=1
for some a1 , . . . , an ∈ F.
Since T is linear, it follows that
w = T (v) =
n
X
i=1
ai T (vi ) ∈ span(T (β)).
Reflecting on the action of a linear transformation, we see intuitively that the larger the nullity,
the smaller the rank. In other words, the more vectors that are carried into 0, the smaller the range.
The same heuristic reasoning tells us that the larger the rank, the smaller the nullity. The balance
between rank and nullity is made precise in the next theorem.
Theorem 2.5.3 (Dimension Theorem). Let V and W be vector spaces, and let T : V → W be
linear. If V is finite-dimensional, then
nullity(T ) + rank(T ) = dim(V ).
Proof. Suppose that dim(V ) = n, dim(N (T )) = k, and {v1 , . . . , vk } is a basis for N (T ). We may
extend {v1 , . . . , vk } to a basis β = {v1 , . . . , vn } for V . We claim that S = {T (vk+1 ), . . . , T (vn )}
is a basis for R(T ).
45
A. W. Richter
2.5. LINEAR TRANSFORMATIONS
First we prove that S generates R(T ). Using Theorem 2.5.2 and the fact that T (vi ) = 0 for
1 ≤ i ≤ k, we have
R(T ) = span({T (vk+1 ), . . . , T (vn )})
Now we prove that S is linearly independent. Form
n
X
i=k+1
bi T (vi ) = 0 for bk+1 , . . . , bn ∈ F.
Using the fact that T is linear, we have
!
n
X
bi vi = 0 which implies
T
i=k+1
n
X
i=k+1
bi vi ∈ N (T ).
Hence there exist c1 , . . . , ck ∈ F such that
n
X
bi vi =
i=k+1
k
X
ci vi
which implies
n
k
X
X
(−ci )vi +
bi vi = 0
i=1
i=1
i=k+1
Since β is a basis for V , we have bi = 0 for all i. Hence S is linearly independent. Notice that this
argument also shows that T (vk+1 ), . . . , T (vn ) are distinct, and hence rank(T ) = n − k.
Theorem 2.5.4. Let V and W be vector spaces, and let T : V → W be linear. Then T is one-forone if and only if N (T ) = {0}.
Proof. First note that T is one-to-one if and only if T (x) = T (y) implies x = y. Suppose N (T ) =
{0} and T (x) = T (y). Since T is linear, T (x − y) = T (x) − T (y) = 0. Thus, x − y ∈ N (T ).
By assumption, N (T ) = {0}. Hence, x − y = 0, which implies x = y. Now assume that T is
injective. Let x ∈ N (T ), then T (x) = 0 = T (0). Hence x = 0, since T is injective.
Example 2.5.4. Let M : M2×3 (R) → M2×2 (R) defined by
a11 a12 a13
2a11 − a12 a13 + 2a12
T
=
.
a21 a22 a23
0
0
For any x, y ∈ M2×3 (R) and any c ∈ R
T
cx11 + y11
cx21 + y21
cx12 + y12
cx22 + y22
cx13 + y13
cx23 + y23
cx13 + y13 + 2(cx12 + y12 )
0
c(2x11 − x12 ) c(x13 + 2x12 )
2y11 − y12 y13 + 2y12
=
+
0
0
0
0
y11 y12 y13
x11 x12 x13
.
+T
= cT
y21 y22 y23
x21 x22 x23
=
2(cx11 + y11 ) − (cx12 + y12 )
0
To find the null space, we must find an x ∈ M2×3 (R) such that
2x11 − x12 = 0
and
x13 + 2x12 = 0.
Equating coefficients
2 −1 0 rref 1 0 1/4
∼
.
0 2 1
0 1 1/2
46
A. W. Richter
2.5. LINEAR TRANSFORMATIONS
Thus, N (T ) = {(−r/4, −r/2, r, s, t, u)} for r, s, t, u ∈ R. Hence, a basis for the null space can be
written
−1/4 −1/2 1
0 0 0
0 0 0
0 0 0
βN =
,
,
,
0
0
0
1 0 0
0 1 0
0 0 1
and nullity(T ) = dim(N (T )) = 4. Since
2a11 − a12 a13 + 2a12
2a11 − a12 0
0 a13 + 2a12
=
+
0
0
0
0
0
0
1 0
0 1
= (2a11 − a12 )
+ (a13 + 2a12
,
0 0
0 0
a basis for the range is given by
1 0
0 1
βR =
,
0 0
0 0
and rank(T ) = dim(R(T )) = 2. This verifies the dimension theorem, since dim(M2×3 (R)) =
6 = 2 + 4. Since nullity(T ) 6= 0, T is not one-to-one. Since rank(T ) = 2 6= dim(M2×3 (R)), T is
not onto.
Example 2.5.5. Let


1 −1 −1 2 1
A =  2 −2 −1 3 3 
−1 1 −1 0 −3
and define LA : M5×1 → M3×1 by LA (x) = Ax. To find a basis for N (LA ), solve Ax = 0. We
have




1 −1 −1 2 1
1 −1 0 1 2
rref
A =  2 −2 −1 3 3  ∼ 0 0 1 −1 1 .
−1 1 −1 0 −3
0 0 0 0 0
Thus, N (T ) = (r − s − 2t, r, s − t, s, t) for r, s, t ∈ R. Hence, a basis for the null space can be
written
     
1
−2 
−1












1
0
     0 







βN = 0 ,  1  , −1



0  1   0 






0
1
0
and nullity(T ) = dim(N (T )) = 3. In general Ax = b if and only if b is a linear combination of
the columns of A (the column space). Hence
   
−1 
 1
βR =  2  , −1 .


−1
−1
47
Chapter 3
Basic Matrix Properties and Operations
3.1 Determinants
3.1.1 Minors, Cofactors, and Evaluating Determinants
Definition 3.1.1 (Minor and Cofactor of a Matrix). Let A be an n × n matrix. Let Aij be the
(n − 1) × (n − 1) submatrix obtained by deleting row i and column j from A. Then, the scalar
Mij ≡ det Aij
is called the (i, j)th minor of A and the scalar
Cij ≡ (−1)i+j Mij
is called the (i, j)th cofactor of A.
Definition 3.1.2 (Determinant). The determinant of an n × n matrix A is given by
det A = a11 C11 + a12 C12 + · · · + a1n C1n
= a11 M11 − a12 M12 + · · · + (−1)n+1 a1n M1n .
Example 3.1.1. To calculate the determinant of a 2 × 2 matrix we have
a11 a12
det
= a11 a22 − a12 a21 .
a21 a22
To calculate the determinant of a 3 × 3 matrix we have


a11 a12 a13
det(A3×3 ) = det a21 a22 a23 
a31 a32 a33
a12 a13
a21 a23
a22 a23
.
+ a13 det
− a12 det
= a11 det
a22 a23
a31 a33
a32 a33
3.1.2 Properties of Determinants
The following is a list of some useful properties of determinants:
48
(3.1)
(3.2)
A. W. Richter
3.1. DETERMINANTS
(a) Rows and columns can be interchanged without affecting the value of a determinant. That is
|A| = |AT |.
(b) If two rows (or columns) are interchanged, the sign of the determinant changes. For example
3 4 = − 1 −2 .
3 4 1 −2
(c) If a row (or column) is changed by adding to or subtracting from its elements the corresponding elements of any other row (or column), the determinant remains unaltered. For example
3 4 3 + 1 4 − 2 4 2 ∼
=
= −10.
1 −2 1
−2 1 −2
(d) If the elements in any row (or column) have a common factor α, then the determinant equals
the determinant of the corresponding matrix in which α = 1, multiplied by α. For example
6 8 = 2 3 4 = 2 × (−10) = −20.
1 −2
1 −2
(e) When at least one row (or column) of a matrix is a linear combination of the other rows (or
columns), the determinant is zero. Conversely, if the determinant is zero, then at least one
row and/or one column are linearly dependent on the other rows and columns, respectively.
For example the


3 2
1
det 1 2 −1
2 −1 3
is zero because the first column is a linear combination of the second and third columns
column 1 = column 2 + column 3.
Similarly, there is a linear dependence between the rows, which is given by the relation
row 1 =
4
7
row 2 + row 3.
8
5
(f) The determinant of an upper triangular or lower triangular matrix is the product of the main
diagonal entries. For example:
3 2 1 0 2 −1 = 3 × 2 × 4 = 24.
0 0 4 (g) The determinant of the product of two square matrices is the product of the individual determinants. Fore example
|AB| = |A||B|.
This rule can be generalized to any number of factors. One immediate application is to matrix
powers: |A2 | = |A||A| = |A|2 , and more generally |An | = |A|n for integer n.
49
A. W. Richter
3.2. INVERSES OF MATRICES
3.1.3 Singular Matrices and Rank
If the determinant of a n × n square matrix is zero, then the matrix is said to be singular. This
means that at least one row and/or one column are linearly dependent on the others. The rank r of a
matrix A, written r = rank A, is the greatest number of linearly independent rows or columns that
exist in the matrix A. Numerically, r is equal to the order of the largest non-vanishing determinant
|B| associated with any square matrix B, which can be constructed from A by a combination of r
rows and r columns. If the determinant of A is nonzero, then A is said to be nonsingular. The rank
of a nonsingular n × n matrix is equal to n. The rank of AT is the same as that of A, since it is only
necessary to swap “rows” and “columns” in the definition.
Example 3.1.2. The 3 × 3 matrix
has rank r = 3 because |A| = −5 6= 0.
Example 3.1.3. The matrix


3 2
2
A = 1 2 −1
2 −1 3


3 2
1
A = 1 2 −1
2 −1 3
already used in an above section (3.1.2 on page 48) is singular because its first row and column may
be expressed as linear combinations of the others. Removing the first row and column, we are left
with a 2 × 2 matrix whose determinant is 2(3) − (−1)(−1) = 5 6= 0. Consequently, matrix A has
rank r = 2.
3.2 Inverses of Matrices
Definition 3.2.1 (Matrix Inverse). Let I = [e1 , e2 , . . . , en ] be an n × n identity matrix. Let A be an
n × n matrix. An n × n matrix, B, is called an inverse of A if and only if AB = I and BA = I. B
is typically written as A−1 .
The most important application of inverses is the solution of linear systems. Suppose
Ax = y,
where x and y are n × 1 column vectors. Premultiplying both sides by A−1 we get the inverse
relationship
x = A−1 y.
More generally, consider the matrix equation for multiple (m) right-hand sides:
A
X = Y ,
n×n n×m
n×m
which reduces to Ax = y for m = 1. The inverse relation that gives X as a function of Y is
X = A−1 Y.
In particular, the solution of
AX = I
is X = A−1 . Practical methods for computing inverses are based on directly solving this equation.
50
A. W. Richter
3.2. INVERSES OF MATRICES
3.2.1 Computation of Inverses
The explicit calculation of matrix inverses is seldom needed in large matrix computations. But,
occasionally the need arises for calculating the explicit inverse of small matrices by hand. For
example, the inversion of Jacobian matrices.
A general formula for elements of the inverse is as follows. Let B = [bij ] = A−1 . Then
bij =
Cji
,
|A|
where bji denotes the entries of A−1 . Cij is defined as (i, j)th cofactor of A or, more precisely,
the determinant of the submatrix of order (n − 1) × (n − 1) obtained by deleting the ith row and
j th column of A, multiplied by (−1)i+j . The n × n matrix whose (i, j)th entry is Cji is called the
adjugate (or classical adjoint) of A and is written adj A.1
This direct inversion procedure is useful only for small matrix orders: 2 or 3. In the examples
below the inversion formulas for second and third order matrices are listed.
Example 3.2.1. For order n = 2:
a11 a12
A=
a21 a22
−1
A
implies
1
a22 −a12
=
,
|A| −a21 a11
where |A| is given by equation 3.1 on page 48.
Example 3.2.2. For order n = 3:


a11 a12 a13
A = a21 a22 a23  ,
a31 a32 a33
where
C11
C12
C13
a
= 22
a32
a
= − 21
a31
a
= 21
a31
a23 ,
a33 a23 ,
a33 a22 ,
a32 C21
C22
C23
A−1
a
= − 12
a32
a
= 11
a31
a
= − 11
a31
and |A| is given by equation 3.2 on page 48.
Example 3.2.3.


2 4 2
A = 3 1 1 ,
1 0 1
A−1


C
C21 C31
1  11
C12 C22 C32 
=
|A|
C13 C23 C33
a13 ,
a33 a13 ,
a33 a12 ,
a32 C31
C32
C33
a
= 12
a22
a
= − 11
a21
a
= 11
a21
a13 ,
a23 a13 ,
a23 a12 ,
a22 

1 −4
2
1
−2 0
4 .
=−
8
−1 4 −10
If the order exceeds 3, the general inversion formula becomes rapidly useless as it displays combinatorial complexity.
1
The adjugate has sometimes been called the “adjoint”, but that terminology is ambiguous. Today, “adjoint” of a
matrix normally refers to its conjugate transpose.
51
A. W. Richter
3.3. QUADRATIC FORMS AND DEFINITENESS
3.2.2 Properties of Inverses
The following is a list of some useful properties of inverses:
(a) The inverse of the transpose is equal to the transpose of the inverse. That is
(AT )−1 = (A−1 )T ,
because
AA−1 = (AA−1 )T = (A−1 )T AT = I.
(b) The inverse of a symmetric matrix is also symmetric. Given the previous rule, (AT )−1 =
A−1 = (A−1 )T , hence A−1 is also symmetric.
(c) The inverse of a matrix product is the reverse product of the inverses of the factors. That is
(AB)−1 = B −1 A−1 .
This property generalizes to an arbitrary number of factors.
(d) For a diagonal matrix D in which all diagonal entries are nonzero, D −1 is again a diagonal
matrix with entries 1/dii .
(e) If S is a block diagonal matrix:

S11 0
0
 0 S22 0


0 S33
S= 0
 ..
..
..
 .
.
.
0
0
0
...
...
...
..
.
0
0
0
..
.
. . . Snn




 = diag[Sii ],


then the inverse matrix is also block diagonal and is given by
 −1

S11
0
0 ... 0
 0 S−1
0 ... 0 
22


 0

−1
0 S−1
.
.
.
0
S =
 = diag[S−1
33
ii ].
 .

.
.
.
.
..
..
..
.. 
 ..
−1
0
0
0 . . . Snn
(f) The inverse of an upper (lower) triangular matrix is also an upper (lower) triangular matrix.
3.3 Quadratic Forms and Definiteness
3.3.1 Quadratic Forms
Definition 3.3.1 (Quadratic Form). Let A denote an n × n symmetric matrix with real entries and
let x denote an n × 1 column vector. Then
Q(x) = xT Ax
is said to be of quadratic form.
52
A. W. Richter
3.3. QUADRATIC FORMS AND DEFINITENESS
Note that

 
a11 . . . a1n
x1




a
.
.
.
a
 21
2n   x2 
Q(x) = xT Ax = x1 x2 . . . xn  .


.
.
..
..   .. 
 ..

.
an1 . . . ann
xn
 Pn

Pni=1 a1i xi 

 i=1 a2i xi 
= x1 x2 . . . xn 

..


.
Pn
i=1 ani xi
X
=
aij xi xj .
i,j
For example, consider the matrix
1 2
A=
2 1
and the vector x. Q is given by
1 2 x1
Q = x Ax = x1 x2
2 1 x2
x1
= x1 + 2x2 2x1 + x2
x2
T
= x21 + 4x1 x2 + x22 ,
which is clearly of quadratic form.
Every quadratic form has a critical point at x = 0. Therefore, we can classify quadratic forms
by whether x = 0 is a maximum, minimum, or neither. This is what definiteness is about.
3.3.2 Definiteness of Quadratic Forms
Definition 3.3.2 (Positive/Negative Definiteness). Let A be an n × n symmetric matrix, then A is
(a) positive definite if xT Ax > 0 for all x 6= 0 in Rn ,
(b) positive semi-definite if xT Ax ≥ 0 for all x 6= 0 in Rn ,
(c) negative definite if xT Ax < 0 for all x 6= 0 in Rn ,
(d) negative semi-definite if xT Ax ≤ 0 for all x 6= 0 in Rn , and
(e) indefinite if xT Ax > 0 for some x in Rn and < 0 for some other x in Rn .
Example 3.3.1. Consider a 3 × 3 diagonal matrix D given by


1 0 0
D = 0 2 0
0 0 4
53
A. W. Richter
3.3. QUADRATIC FORMS AND DEFINITENESS
and a general 3-element vector x. The general quadratic form is given by
 

x1
1 0 0
T



x2 
Q = x Ax = x1 x2 x3 0 2 0
0 0 4
x3
 
x1
= x1 2x2 4x3 x2 
x3
= x21 + 2x22 + 4x23 .
Note that for any real vector x 6= 0, Q will be positive, since the square of any number is positive,
the coefficients of the squared terms are positive, and the sum of positive numbers is always positive.
Thus, A is positive definite.
Example 3.3.2. Now consider an alternative 3 × 3 matrix given by


−2 1
0
D =  1 −2 0  .
0
0 −2
The general quadratic form is given by

 
−2 1
0
x1
Q = xT Ax = x1 x2 x3  1 −2 0  x2 
0
0 −2 x3
 
x1
= −2x1 + x2 x1 − 2x2 −2x3 x2 
x3
= −2x21 + 2x1 x2 − 2x22 − 2x23
= −2x21 − 2[x22 − x1 x2 ] − 2x23 .
Note that Q will be negative if x1 and x2 are of opposite sign or equal to one another. Now consider
the case where |x1 | > |x2 |. Write Q as
Q = −2x21 + 2x1 x2 − 2x22 − 2x23 .
The first, third, and fourth terms are clearly negative. With |x1 | > |x2 |, |2x21 | > |2x1 x2 |, the first
term is more negative than the second term is positive, and hence the whole expression is negative.
Now consider the case where |x1 | < |x2 |. The first, third, and fourth terms are still negative. But,
with |x1 | < |x2 |, |2x22 | > |2x1 x2 | so that the third term is more negative than the second term is
positive, and so the whole expression is negative. Thus, this quadratic form is negative definite for
all real values of x 6= 0.
Remark 3.3.1. A matrix that is positive (negative) definite is positive (negative) semi-definite.
The definiteness of a matrix plays an important role. For example, for a function f (x) of one
variable, the sign of the second derivative f ′′ (x0 ) at a critical point x0 gives a sufficient condition
for determining whether x0 is a maximum, minimum, or neither (proposition 1.1.2). This test generalizes to more than one variable using the definiteness of the Hessian matrix H (more on this
when we get to optimization). There is a convenient way to test for the definiteness of a matrix, but
before we can formulate this test we first need to define the concept of principal minors of a matrix.
54
A. W. Richter
3.3. QUADRATIC FORMS AND DEFINITENESS
Definition 3.3.3 (Principle Submatrix and Principle Minor). Let A be an n × n matrix. A k × k
submatrix of A formed by deleting (n − k) columns, say columns i1 , i2 , ...in−k and the same n − k
rows, rows i1 , i2 , ...in−k , from A is called a kth order principal submatrix of A. The determinant of
a k × k principal submatrix denoted Bk is called a kth order principal minor of A.
Definition 3.3.4 (Leading Principle Submatrix and Leading Principle Minor). Let A be an n × n
matrix. The kth order principle submatrix of A obtained by deleting the last n − k rows and the last
n − k columns from A is called the kth order leading principle submatrix of A. Its determinant,
denoted ∆k , is called the kth order leading principle minor of A.
Example 3.3.3. For a general 3 × 3 matrix


a11 a12 a13
A = a21 a22 a23  ,
a31 a32 a33
there is one third order principle minor: B3 = det(A) = |A|. There are three second order principle
minors:
a11 a12 (1)
, formed by deleting column 3 and row 3 from A;
1. B2 = a21 a22 a11 a13 (2)
, formed by deleting column 2 and row 2 from A;
2. B2 = a31 a33 a22 a23 (3)
, formed by deleting column 1 and row 1 from A.
3. B2 = a32 a33 There are three first order principle minors:
(1)
1. B1 = a11 , formed by deleting the last 2 rows and columns;
(2)
2. B1 = a22 , formed by deleting the first and third columns and rows, and
(3)
3. B1 = a33 , formed by deleting the first 2 rows and columns.
The leading principle minors of order k, denoted ∆k , are:
a11 a12 , and ∆3 = det(A) = |A|.
∆1 = a11 , ∆2 = a21 a22 Theorem 3.3.1 (Positive/Negative Definiteness). Let A be an n × n symmetric matrix. Then,
(a) A is positive definite if and only if ∆k > 0 for k = 1, 2, . . . , n (all of its leading principle
minors are strictly positive);
(b) A is negative definite if and only if (−1)k ∆k > 0 for k = 1, 2, . . . , n (every leading principle
minor of odd order is strictly negative and every leading principle minor of even order is
strictly positive);
(c) If some kth order principle minor of A (or some pair of them) is nonzero but does not fit either
of the above two sign patterns, then A is indefinite.
Theorem 3.3.2 (Positive/Negative Semi-Definiteness). Let A be an n × n symmetric matrix. Then,
55
A. W. Richter
3.3. QUADRATIC FORMS AND DEFINITENESS
(a) A is positive semi-definite if and only if Bk ≥ 0 f or k = 1, 2, . . . , n (every principle minor
of A is nonnegative);
(b) A is negative semi-definite if and only if (−1)k Bk ≥ 0 f or k = 1, 2, , . . . , n (every principle
minor of odd order is nonpositive and every principle minor of even order is nonnegative).
Remark 3.3.2. Given an n × n symmetric matrix, the conditions ∆k ≥ 0 for positive semidefiniteness and (−1)k ∆k ≥ 0 for negative semi-definiteness are necessary conditions but not
sufficient conditions. To see this, consider the following 2 × 2 symmetric matrix.
0 0
A=
.
0 −4
For this matrix, ∆1 = 0 and ∆2 = 0. Thus, looking only at leading principle minors, one could
falsely conclude that this matrix is positive semi-definite. However, we have to check all principle
minors in order to deduce the correct form of semi-definiteness. If one checks all principle minors
then one sees
(1)
B1 = 0 ≤ 0,
(2)
B1 = −4 ≤ 0,
0 0 = 0 ≥ 0,
B2 = 0 −4
which violates the definition of positive semi-definiteness. In fact, we see that this matrix is negative
semi-definite.
Example 3.3.4. Suppose A is a 4 × 4 symmetric matrix. Then
(a) ∆1 > 0, ∆2 > 0, ∆3 > 0, ∆4 > 0 → positive definite
(b) ∆1 < 0, ∆2 > 0, ∆3 < 0, ∆4 > 0 → negative definite
(c) ∆1 > 0, ∆2 > 0, ∆3 = 0, ∆4 < 0 → indefinite because of ∆4
(d) ∆1 < 0, ∆2 < 0, ∆3 < 0, ∆4 < 0 → indefinite because of ∆2 and ∆4
(e) ∆1 = 0, ∆2 < 0, ∆3 > 0, ∆4 = 0 → indefinite because of ∆2
(f) ∆1 > 0, ∆2 = 0, ∆3 > 0, ∆4 > 0 → A is not definite, not negative semi-definite but might
be positive semi-definite. However, to establish this, we must to check all 15 principle minors
Bk , for k = 1, 2, 3, 4.
(g) ∆1 = 0, ∆2 > 0, ∆3 = 0, ∆4 > 0 → A is not definite, but may be positive semi-definite
or negative semi-definite. In order to determine semi-definiteness we must check all principle
minors.
Example 3.3.5. Use the above theorems to check the following matrices for definiteness:
(a)
1 2
A=
,
2 1
∆1 = 1 > 0, ∆2 = −3 < 0
Thus A is indefinite.
56
A. W. Richter
(b)
3.4. EIGENVALUES AND EIGENVECTORS
−1 1
,
A=
1 −1
(1)
(2)
B1 = B1 = −1 < 0, ∆2 = 0
Thus A is negative semi-definite.
(c)


1 0 −1
B =  0 3 0 ,
−1 0 4
∆1 = 1 > 0, ∆2 = 3 > 0, ∆3 = 9 > 0
Thus, B is positive definite.
(d)


1 3
2 3
3 5 −2 4

B=
2 −2 2 1 ,
3 4
1 2
∆1 = 1 > 0, ∆2 = −4 < 0, ∆3 = −56 < 0, ∆4 = 142 > 0
Thus B is indefinite.
3.4 Eigenvalues and Eigenvectors
Consider the special form of the linear system Ax = y in which the right-hand side vector y is a
multiple of the solution vector x:
Ax = λx,
(3.3)
or, written in full,
a11 x1 + a12 x2 + · · ·
a21 x1 + a22 x2 + · · ·
..
..
.
.
···
an1 x1 + an2 x2 + · · ·
+ a1n xn
+ a2n xn
..
.
=
=
λx1
λx2
..
.
+ ann xn = λxn .
This is called the standard (or classical) algebraic eigenproblem.
Definition 3.4.1 (Eigenvalues and Eigenvectors). The number λ is said to be an eigenvalue of the
n × n matrix A provided that there exists a nonzero vector x such that
Ax = λx,
in which case the vector x is called an eigenvector of the matrix A.
The system (3.3) can be rearranged into homogeneous form
(A − λI)x = 0.
A nontrivial solution of this equation is possible if and only if the coefficient matrix A − λI is
singular. Such a condition can be expressed as the vanishing of the determinant
a11 − λ
a
·
·
·
a
12
1n
a21
a22 − λ · · ·
a2n |A − λI| = .
= 0.
..
..
..
..
.
.
.
an1
an2
· · · ann − λ
57
A. W. Richter
3.4. EIGENVALUES AND EIGENVECTORS
When this determinant is expanded, we obtain an algebraic polynomial equation in λ of degree n:
P (λ) = λn + α1 λn−1 + · · · + αn = 0.
This is known as the characteristic equation of the matrix A. The left-hand side is known as the
characteristic polynomial. We know that a polynomial of degree n has n (generally complex) roots
λ1 , λ2 , . . . , λn . These n numbers are called eigenvalues, eigenroots, or characteristic values of the
matrix A. The following theorem summarizes the results.
Theorem 3.4.1. The number λ is an eigenvalue of the n × n matrix A if and only if λ satisfies the
characteristic equation
|A − λI| = 0.
With each eigenvalue λi , there is an associated vector xi that satisfies
Ax = λx.
This xi is called an eigenvector or characteristic vector of the matrix A. An eigenvector is unique
only up to a scale factor since if xi is an eigenvector, so is βxi , where β is an arbitrary nonzero
number. For a general 2 × 2 matrix
a11 a12
A=
,
a21 a22
the characteristic polynomial is given by
|A − λI| = λ2 − (a11 + a22 )λ + a11 a22 − a12 a21 ,
and we can solve for the associated eigenvalues by setting the above characteristic equation to zero
and solving for λ.
Example 3.4.1. Find the eigenvalues and associated eigenvectors of the matrix
5
7
.
A=
−2 −4
Solution:
A − λI =
5−λ
7
−2 −4 − λ
(3.4)
so the characteristic equation of A is
set
0 = det(A − λI)
5 − λ
7
= −2 −4 − λ
= (5 − λ)(−4 − λ) − (−2)(7)
= λ2 − λ − 6
= (λ + 2)(λ − 3).
Thus, the matrix A has two eigenvalues −2 and 3. To distinguish them, we write λ1 = −2 and
λ2 = 3. To find the associated eigenvectors, we must separately substitute each eigenvalue into
(3.4) and then solve the resulting system (A − λI)x = 0.
58
A. W. Richter
3.4. EIGENVALUES AND EIGENVECTORS
T
Case 1: λ1 = −2. With x = x1 x2 , the system (A − λI)x = 0 is
7
7
0
x1
=
.
−2 −2 x2
0
Each of the two scalar equations here is a multiple of the equation x1 + x2 = 0, and any nontrivial
T
T
solution x = x1 x2 of this equation is a nonzero multiple of 1 −1 . Hence, to within a
T
constant multiple, the only eigenvector associated with λ1 = −2 is x1 = 1 −1 .
T
Case 2: λ1 = 3. With x = x1 x2 , the system (A − λI)x = 0 is
2
7
0
x1
=
.
−2 −7 x2
0
Again, we have only a single equation, 2x1 + 7x2 = 0, and any nontrivial solution of this equation will suffice. The choice x2 = −2 yields x1 = 7 so (to within a constant multiple) the only
T
eigenvector associated with λ2 = 3 is x2 = 7 −2 .
Example 3.4.2. Find the eigenvalues and eigenvectors of the 2 × 2 matrix
−1 3
A=
.
2 0
set
Solution: Given A, |A − λI| = λ2 + λ − 6 = 0. It follows that λ1 = −3 and λ2 = 2. We can
then set up the equation Ax = λ1 x or (A − λ1 I) x = 0. Specifically, we have
2 3 x1
0
−3 3
x1
0
=
and
=
,
2 3 x2
0
2 −2 x2
0
which results in
3
x1 =
−2
or a multiple thereof.
and
1
x2 =
1
Definition 3.4.2 (Eigenspace). The set of all the eigenvectors, corresponding to an eigenvalue, and
the zero vector, is a vector space called the eigenspace of λ, and may be written as E(λ).
In the above example, the Eigenspace corresponding to λ1 is the line defined by the zero vector
T
and 3 −2 . Likewise the Eigenspace corresponding to λ2 is the line defined by the zero vector
T
and 1 1 . If A is k×k square matrix, we can have k real eigenvalues, some repeated eigenvalues
or complex eigenvalues. The last two cases are mutually exclusive.
Theorem 3.4.2. Let A be a k × k matrix with eigenvalues λ1 , . . . , λk . Then,
(a) λ1 + λ2 + ... + λk = tr(A) and
(b) λ1 · λ2 · · · λk = det (A).
a11 a12
Proof. Consider the case for 2 × 2 matrices. Assume A =
, then |A − λI| = λ2 −
a21 a22
(a11 + a22 )λ + (a11 a22 − a12 a21 ) = 0, so that the characteristic polynomial is
p(λ) = λ2 − (a11 + a22 )λ + (a11 a22 − a12 a21 )
= λ2 − tr(A)λ + det(A).
59
A. W. Richter
3.4. EIGENVALUES AND EIGENVECTORS
If λ1 and λ2 are the roots of this polynomial, we can rewrite the solution as
p(λ) = β(λ1 − λ)(λ2 − λ)
= βλ2 − β(λ1 + λ2 )λ + βλ1 λ2 ,
for some constant β. Comparing the two expressions, we find that
tr(A) = β(λ1 + λ2 ),
det(A) = βλ1 λ2 .
The theorem naturally extends to higher dimensional cases.
3.4.1 Properties of Eigenvalues and Eigenvectors
(a) Eigenvalues and eigenvectors are defined only for square matrices.
(b) A square matrix A is singular if and only if 0 is an eigenvalue. However, the zero vector
cannot be an eigenvector.
(c) A matrix is invertible if and only if none of its eigenvalues is equal to zero.
(d) Every eigenvalue has an infinite number of eigenvectors associated with it, as any nonzero
scalar multiple of an eigenvector is also an eigenvector.
(e) The entries of a diagonal matrix D are eigenvalues of D.
(f) The eigenvalues of A and AT are the same (as their characteristic polynomials are the same),
but there is no simple relationship between their eigenvectors.
(g) The eigenvalues of a shifted matrix A − αI are λ − α and the eigenvectors are the same as
those of A since
Ax = λx ⇒ (A − αI)x = (λ − α)x.
(h) The eigenvalues of A−1 are 1/λ and the eigenvectors are the same as those of A since
Ax = λx ⇒ A−1 x = λ−1 x.
(i) The eigenvalues of A2 are λ2 , and the eigenvectors are the same as those of A since
Ax = λx ⇒ A2 x = A(λx) = λ(Ax) = λ2 x.
This property naturally generalizes to high-order powers (i.e. the eigenvalues of Ak are λk ).
(j) Let x be the eigenvector of an n × n matrix A with eigenvalue λ. Then, the eigenvalue of
αk Ak + αk−1 Ak−1 + · · · + α1 A + α0 I
associated with eigenvector x is
αk λk + αk−1 λk−1 + · · · + α1 λ + α0 ,
where αk , αk−1 , . . . , α1 , α0 are real numbers and k is a positive integer.
60
A. W. Richter
3.4. EIGENVALUES AND EIGENVECTORS
3.4.2 Definiteness and Eigenvalues
Definition 3.4.3. Let A be an n × n symmetric matrix. Then,
(a) A is positive definite if and only if all of its eigenvalues are strictly positive (λi > 0 ∀i)
(b) A is negative definite if and only if all of its eigenvalues are strictly negative (λi < 0 ∀i)
(c) A is positive semi-definite if and only if all of its eigenvalues are nonnegative (λi ≥ 0 ∀i)
(d) A is negative semi-definite if and only if all of its eigenvalues are nonpositive (λi ≤ 0 ∀i)
1 2
Example 3.4.3. If A =
, then ∆1 = 1 > 0, ∆2 = −3 < 0, which implies that A is
2 1
indefinite. Or, using eigenvalues, we can solve
1 2
det
− λI = λ2 − 2λ − 3 = (λ − 3)(λ + 1) = 0
2 1
which implies λ1 = −1 and λ2 = 3, and hence indefiniteness.
−1 1
(2)
Example 3.4.4. If A =
, then ∆1 = −1 < 0, ∆2 = 0 ≤ 0 and also B1 = −1 < 0.
1 −1
Thus, A is negative semi-definite. Alternatively, using eigenvalues
−1 1
det
− λI = λ(2 + λ) = 0,
1 −1
which implies λ1 = 0 and λ2 = −2 and fulfills the condition for negative semi-definiteness.


1 0 −1
1 0
= 3 > 0 and ∆3 = 9 > 0,


Example 3.4.5. If A = 0 3 0 , then ∆1 = 1 > 0, ∆2 = 0 3
−1 0 4
which implies A is positive definite. Alternatively, using eigenvalues,


1−λ
0
−1
det  0
3−λ
0  = (3 − λ)(λ2 − 5λ + 3) = 0,
−1
0
4−λ
which implies λ1,2 = (5 ±
√
13)/2 and fulfills the condition for positive definiteness.
61
Chapter 4
Concavity, Convexity, Quasi-Concavity,
and Quasi-Convexity
4.1 Convex Sets
Definition 4.1.1 (Convex Set). A set X ∈ Rn is (strictly) convex if given any two points x′ and x′′
in X, the point
xλ = (1 − λ)x′ + λx′′
is also in X (int X) for every λ ∈ [0, 1] (λ ∈ (0, 1)).
Remark 4.1.1. A vector of the form xλ = (1 − λ)x′ + λx′′ , with λ ∈ [0, 1] is called a convex
combination of x′ and x′′ .
Theorem 4.1.1. Let X and Y be convex sets in Rn , and let α be a real number. Then the sets
αX = {z ∈ Rn : z = αx f or some x ∈ X}
and
n
X + Y = {z ∈ R : z = x + y f or some x ∈ X and y ∈ Y }
are convex.
Proof. Let X and Y be any two convex subsets of Rn . Suppose αx′ and αx′′ are any two points in
αX, naturally with x′ and x′′ in X. Given λ ∈ [0, 1], since X is convex, we know λx′ +(1−λ)x′′ ∈
X. Thus,
λ(αx′ ) + (1 − λ)(αx′′ ) = α[λx′ + (1 − λ)x′′ ] ∈ αX.
Therefore αX is shown to be convex. Now suppose x′ + y′ and x′′ + y′′ are any two points in
X + Y , naturally with x′ and x′′ in X and y′ and y′′ in Y . Given λ ∈ [0, 1], since X and Y are
both convex, we know λx′ + (1 − λ)x′′ ∈ X and λy′ + (1 − λ)y′′ ∈ Y . Thus,
λ(x′ + y′ ) + (1 − λ)(x′′ + y′′ ) = λx′ + (1 − λ)x′′ + λy′ + (1 − λ)y′′ ∈ X + Y
Therefore X + Y is shown to be convex.
Example 4.1.1. Is {(x, y) ∈ R2 |y ≥ x ∧ xy ≥ 1} open, closed, compact or convex?
Solution: Not open, since S 6= int S. For example, (1, 1) is in the set, but every neighborhood
around (1, 1) contains points which are not in the set. Closed since the set contains all of its boundary points. Not compact since the set is not bounded [(n, n) is in the set for n ∈ N]. Not convex
since since (1, 1) and (−1, −1) are in the set, but (1/2)(1, 1) + (1/2)(−1, −1) = (0, 0) is not.
62
A. W. Richter
4.2. CONCAVE AND CONVEX FUNCTIONS
Figure 4.1: Convex and Non-Convex Sets
(a) Strictly Convex
(b) Convex, but not Strictly Convex
(c) Not Convex
4.2 Concave and Convex Functions
Definition 4.2.1 (Concave Function). The function f : Rn ⊇ X → R, where X is a convex set, is
concave if given any two points x′ and x′′ in X we have
(1 − λ)f (x′ ) + λf (x′′ ) ≤ f [(1 − λ)x′ + λx′′ ] ≡ f (xλ ) ∀λ ∈ [0, 1]
and is strictly concave if the inequality holds strictly for λ ∈ (0, 1), that is, if
∀x′ , x′′ ∈ X and λ ∈ (0, 1), (1 − λ)f (x′ ) + λf (x′′ ) < f [(1 − λ)x′ + λx′′ ] ≡ f (xλ ).
Remark 4.2.1. Reversing the direction of the inequalities in the theorem, we obtain the definitions
of convexity and strict convexity.
Many introductory calculus texts call convex functions “concave up” and concave functions
“concave down”, as we did in section 1.1.4. Henceforth, we will stick with the more classical
terms: “convex” and “concave”.
Theorem 4.2.1. Let f : Rn ⊇ X → R be a C 1 function defined on an open and convex set X.
Then f is concave if and only if given any two points x and x0 in X, we have
f (x) ≤ f (x0 ) + ∇f (x0 )(x − x0 ).
Moreover, f is strictly concave if and only if the inequality holds strictly, that is, if and only if
f (x) < f (x0 ) + ∇f (x0 )(x − x0 )
for all pairs of distinct points x0 and x in X.
Remark 4.2.2. This theorem says that a function f is concave if and only if the graph of f lies
everywhere on or below any tangent plane. Equivalently, it says that a function is concave if and
only if the slope of the function at some arbitrary point, say x0 < x, is greater than the slope of the
secant line between points x and x0 . Reversing the direction of the inequalities in the theorem, we
obtain a theorem corresponding to convexity and strict convexity.
63
A. W. Richter
4.2. CONCAVE AND CONVEX FUNCTIONS
Figure 4.2: Concave and Convex Functions
(a) Concave Function
(b) Convex Function
Theorem 4.2.2. Let f : Rn ⊇ X → R be a concave function and g : R → R be an increasing and
concave function defined on an interval I containing f (X). Then the function h : X → R defined
by h(x) = g[f (x)] is concave. Moreover, if f is strictly concave and g is strictly increasing then h
is strictly concave. Analogous claims hold for f convex (again with g increasing).
Proof. Consider any x′ , x′′ ∈ X and λ ∈ [0, 1]. Since f is concave, we have
f (xλ ) ≡ f (λx′ + (1 − λ)x′′ ) ≥ λf (x′ ) + (1 − λ)f (x′′ ).
Since g is increasing and concave
h(xλ ) = g[f (λx′ + (1 − λ)x′′ )] ≥ g[λf (x′ ) + (1 − λ)f (x′′ )]
≥ λg[f (x′ )] + (1 − λ)g[f (x′′ )]
= λh(x′ ) + (1 − λ)h(x′′ ).
This establishes that h is concave. If f is strictly concave and g is strictly increasing, then the
inequality is strict and hence h is strictly concave.
Example 4.2.1. Let the domain be R++ . Consider h(x) = e1/x . Let f (x) = 1/x and let g(y) = ey .
Then h(x) = g[f (x)]. Function f is strictly convex and g is (strictly) convex and strictly increasing.
Therefore, by Theorem 4.2.2, h is strictly convex.
Remark 4.2.3. It is important in Theorem 4.2.2 that g be increasing. To see this, let the domain be
2
R++ .√Consider h(x) = e−x , which just the standard normal density except that it is off by a factor
2
of 1/ 2π. Let f (x) = ex and let g(y) = 1/y. Then h(x) = g[f (x)]. Now, f is convex on R++
and g is also convex on R++ . The function h is not, however, convex. While it is strictly convex for
|x| sufficiently large, for x near zero it is strictly concave. This does not contradict Theorem 4.2.2
because g here is decreasing.
64
A. W. Richter
4.3. CONCAVITY, CONVEXITY, AND DEFINITENESS
Theorem 4.2.3. Suppose f1 , . . . , fn are concave
functions where fi : Rn ⊇ X → R. Then for any
Pn
α1 , . . . , αn for which each αi ≥ 0, f ≡ i=1 αi fi is also a concave function. If, in addition, at
least one fj is strictly concave and αi > 0, then f is strictly concave.
Proof. Consider any x′ , x′′ ∈ X and λ ∈ [0, 1]. If each fi is concave, we have
fi (xλ ) ≡ fi (λx′ + (1 − λ)x′′ ) ≥ λfi (x′ ) + (1 − λ)fi (x′′ )
Therefore,
f (xλ ) ≡ f (λx′ + (1 − λ)x′′ ) =
=λ
n
X
i=1
n
X
αi fi (λx′ + (1 − λ)x′′ ) ≥
i=1
n
X
αi fi (x′ ) + (1 − λ)
i=1
n
X
i=1
αi [λfi (x′ ) + (1 − λ)fi (x′′ )]
αi fi (x′′ ) ≡ λf (x′ ) + (1 − λ)f (x′′ ).
This establishes that f is concave. If some fi is strictly concave and αi > 0, then the inequality is
strict.
4.3 Concavity, Convexity, and Definiteness
The following result says that a function is concave if and only if its Hessian is negative semidefinite everywhere. A twice-differentiable function of a single variable is concave (convex) if and
only if f ′′ (x) ≤ (≥)0 everywhere.
Theorem 4.3.1. Let f be a C 2 function on an open convex set X of Rn . Then
(a) If H(f ) is negative definite for every x ∈ X, then f is strictly concave.
(b) If H(f ) is negative semi-definite for every x ∈ X, then f is concave.
(c) If f is concave, then H(f ) is negative semi-definite for every x ∈ X.
(d) If H(f ) is positive definite for every x ∈ X, then f is strictly convex.
(e) If H(f ) is positive semi-definite for every x ∈ X, then f is convex.
(f) If f is convex, then H(f ) is positive semi-definite for every x ∈ X.
Remark 4.3.1. Note that if f is strictly concave, the Hessian can either be negative semi-definite
or negative definite. Thus, if you show that the Hessian is not negative definite, but only negative
semi-definite, you cannot conclude that f is not strictly concave.
Example 4.3.1. The Hessian of the function f (x, y) = x4 + x2 y 2 + y 4 − 3x − 8y is
12x2 + 2y 2
4xy
H(f ) =
.
4xy
2x2 + 12y 2
(1)
(2)
The principle minors, B1 = 12x2 + 2y 2 , B1 = 2x2 + 12y 2 , and B2 = 24x4 + 132x2 y 2 + 24y 4
are all weakly positive for all values of x and y, so f is a convex function on all Rn .
65
A. W. Richter
4.4. QUASI-CONCAVE AND QUASI-CONVEX FUNCTIONS
Example 4.3.2. A simple utility or production function is f (x, y) = xy. Its Hessian is
0 1
H(f ) =
,
1 0
whose second order principle minor is det H(f ) = −1. Since this second order principle minor is
negative, H(f ) is indefinite and f is neither concave nor convex.
Example 4.3.3. Consider the monotonic transformation of the function f in the previous example
by the function g(z) = z 1/4 : g[f (x, y)] = x1/4 y 1/4 , defined only on the positive quadrant R2+ . The
hessian of g is
3 −7/4 1/4 1 −3/4 −3/4 − x
y
x
y
H(g) = 1 16−3/4 −3/4 16 3 1/4 −7/4 .
x
y
−
x
y
16
16
The first order principle minors are both negative and the second order leading principle minor,
x−3/2 y −3/2 /32, is at least weakly positive. Therefore, H(g) is negative semi-definite on R2+ and G
is a concave function on R2+ . This result reaffirms the result of Theorem 4.2.2.
4.4 Quasi-concave and Quasi-convex Functions
Definition 4.4.1 (Quasi-concavity and Quasi-convexity). Let f : Rn ⊇ X → R be a real-valued
function defined on a convex set X. We say that f is quasi-concave (quasi-convex) if for all x′ and
x′′ in X and all λ ∈ [0, 1] we have
f [(1 − λ)x′ + λx′′ ] ≥ min{f (x′ ), f (x′′ )}
(f [(1 − λ)x′ + λx′′ ] ≤ max{f (x′ ), f (x′′ )}).
We say that f is strictly quasi-concave (quasi-convex) if for all x′ and x′′ in X and all λ ∈ (0, 1)
we have
f [(1 − λ)x′ + λx′′ ] > min{f (x′ ), f (x′′ )}
(f [(1 − λ)x′ + λx′′ ] < max{f (x′ ), f (x′′ )}).
Theorem 4.4.1. Let f be a real-valued function defined on a convex set X ⊆ Rn . Then f is quasiconcave (quasi-convex) if and only if the upper contour sets (lower contour sets) of f are all convex,
that is, if for any a ∈ R the set
Ua = {x ∈ X : f (x) ≥ a}
(La = {x ∈ X : f (x) ≤ a})
is convex.
Proof. Assume f is quasi-concave. Fix α and let x′ , x′′ ∈ Ua . Then for all λ ∈ [0, 1],
f (λx′ + (1 − λ)x′′ ) ≥ min{f (x′ ), f (x′′ )}
Since x′ , x′′ ∈ Ua , f (x′ ) ≥ a and f (x′′ ) ≥ a, which implies the min{f (x′ ), f (x′′ )} ≥ a. Therefore, f (λx′ + (1 − λ)x′′ ) ≥ a and thus λx′ + (1 − λ)x′′ ∈ Ua . Hence, the upper contour set is
convex. Now assume the upper contour set is convex. Then for all λ ∈ [0, 1] and for x′ , x′′ ∈ Ua ,
we have λx′ + (1 − λ)x′′ ∈ Ua . This implies that f (λx′ + (1 − λ)x′′ ) ≥ α. Since this result
must hold for any α, it must hold for α = min{f (x′ ), f (x′′ )}. Thus, f is quasi-concave. A similar
argument holds for quasi-convexity.
66
A. W. Richter
4.4. QUASI-CONCAVE AND QUASI-CONVEX FUNCTIONS
Figure 4.3: Quasi-concave but not concave
Theorem 4.4.2. All concave (convex) functions are quasi-concave (quasi-convex) and all strictly
concave (strictly convex) functions are strictly quasi-concave (strictly quasi-convex).
Proof. Without loss of generality assume f (x′ ) ≥ f (x′′ ). Since f is concave, for 0 ≤ λ ≤ 1
f (λx′ + (1 − λ)x′′ ) ≥ λf (x′ ) + (1 − λ)f (x′′ )
≥ λf (x′′ ) + (1 − λ)f (x′′ )
= min{f (x′ ), f (x′′ )}.
Thus, f is also quasi-concave. If f is strictly concave then the inequalities become strict and hence
f is strictly quasi-concave. The convex case can be proved in a similar fashion.
It is important to note that the converse of the above theorem is not valid in general (see
figure 4.3). The function f defined on X = {x : x ≥ 0} by f (x) = x2 is quasi-concave (UCS is
convex) but not concave on X, actually it is strictly convex on X.
Theorem 4.4.3. Suppose f : Rn ⊇ X → R is quasi-concave (quasi-convex) and φ : f (X) → R
is increasing. Then φ ◦ f : X → R is quasi-concave (quasi-convex). If f is strictly quasi-concave
(quasi-convex) and φ is strictly increasing, then φ ◦ f is strictly quasi-concave (quasi-convex).
Proof. Consider any x′ , x′′ ∈ X. If f is quasi-concave, then
f (λx′ + (1 − λ)x′′ ) ≥ min{f (x′ ), f (x′′ )}.
Therefore, φ increasing implies
φ[f (λx′ + (1 − λ)x′′ )] ≥ φ[min{f (x′ ), f (x′′ )}] = min{φ[f (x′ )], φ[f (x′′ )]}.
Thus, φ ◦ f = φ[f (x)] is quasi-concave. If f is strictly quasi-concave and φ is strictly increasing,
the inequalities are strict and thus φ ◦ f is strictly quasi-concave. A similar argument holds for
quasi-convexity.
67
A. W. Richter
4.5. QUASI-CONCAVITY, QUASI-CONVEXITY, AND DEFINITENESS
Corollary 4.4.1. Suppose f : Rn ⊇ X → R is quasi-concave and φ : f (X) → R is decreasing.
Then φ ◦ f : X → R is quasi-convex. Suppose f : X → R is quasi-convex and φ : f (X) → R is
decreasing. Then φ ◦ f : X → R is quasi-concave.
Proof. Consider any x′ , x′′ ∈ X. If f is quasi-concave, then
f (λx′ + (1 − λ)x′′ ) ≥ min{f (x′ ), f (x′′ )}.
Therefore, φ decreasing implies
φ[f (λx′ + (1 − λ)x′′ )] ≤ φ[min{f (x′ ), f (x′′ )}] = max{φ[f (x′ )], φ[f (x′′ )]}.
Thus, φ ◦ f = φ[f (x)] is quasi-convex. Similarly if f is quasi-convex.
Remark 4.4.1. The sum of quasi-concave functions need not be quasi-concave unlike the sum of
concave functions which is concave. For instance f1 (x) = x3 and f2 (x) = −x are both quasiconcave, but the sum f3 (x) = f1 (x) + f2 (x) = x3 − x is neither quasi-concave nor convex.
Example 4.4.1. Show that the Cobb-Douglas function
f (x) =
n
Y
xαi i ,
i=1
where αi > 0 for i = 1, . . . , n is quasi-concave for x ≫ 0.
Solution: Consider the natural log of Cobb-Douglas function,
ln f (x) =
n
X
αi ln xi ,
i=1
which is concave since for all i, ln xi is concave and the sum of concave functions is concave
(Theorem 4.2.3). Thus, since concavity implies quasi-concavity, ln f (x) is also quasi-concave. The
exponent et is strictly increasing function: R → R, hence f (x) = exp{ln f (x)} is quasi-concave
by Theorem 4.4.3.
4.5 Quasi-concavity, Quasi-convexity, and Definiteness
Theorem 4.5.1. Let f : Rn ⊇ X → R be a C 2 function defined on an open and convex set X ⊆ Rn ,
and let ∆k be the leading principle minor of the bordered Hessian of f .
(a) A necessary condition for f to be quasi-concave (quasi-convex) is that
(−1)k+1 ∆k ≥ 0 ∀k = 2, . . . , n ∀x ∈ X
(∆k ≤ 0 ∀k = 2, . . . , n ∀x ∈ X)
(b) A sufficient condition for f to be quasi-concave (quasi-convex) is that
(−1)k+1 ∆k > 0 ∀k = 2, . . . , n ∀x ∈ Rn+
(∆k < 0 ∀k = 2, . . . , n ∀x ∈ Rn+ )
68
A. W. Richter
4.5. QUASI-CONCAVITY, QUASI-CONVEXITY, AND DEFINITENESS
Figure 4.4: Not Quasi-concave
(c) If X ⊆ Rn++ , f is monotonically increasing (decreasing), and
(−1)k+1 ∆k > 0 ∀k = 2, . . . , n ∀x ∈ Rn+
(∆k < 0 ∀k = 2, . . . , n ∀x ∈ Rn+ ),
then f is strictly quasi-concave (strictly quasi-convex).
Remark 4.5.1. Be very careful with the direction of the above definitions. In figure 4.4, the function
f (x) = x2 is not quasi-concave since the upper contour set: {x ∈ R|f (x) ≥ 3} = (−∞, −3] ∪
[3, ∞) is not a convex set. However,
0 2x
H(f ) =
2x 2
so (−1)2+1 ∆2 = 4x2 ≥ 0 for all x with strict inequality everywhere except at x = 0. Although
this function fulfills the necessary condition for quasi-concavity, the function is not quasi-concave.
Example 4.5.1. Prove or give a counterexample: If f is a strictly convex function, then f cannot be
quasi-concave.
Solution: False. For f (x) = 1/x defined on R++ , the upper contour set {x : f (x) ≥ c} is
convex for all c; For example, for c = 1, the upper contour set is the interval (0,1], which is clearly
convex. Thus, this function is quasiconcave. However, the function is also strictly convex since
(f ′′ (x) = 2/x3 > 0 for all x > 0).
Example 4.5.2. Consider f (x) = x3 + x. For x ∈ R, the second order condition, fxx = 6x, is not
always nonpositive. Thus, this function is not concave. The bordered Hessian is given by
0
3x2 + 1
H(f ) =
3x2 + 1
6x
so (−1)2+1 ∆2 = (3x2 + 1)2 > 0. Therefore, this function is quasi-concave.
69
A. W. Richter
4.5. QUASI-CONCAVITY, QUASI-CONVEXITY, AND DEFINITENESS
Example 4.5.3.
For the region with x > 0 and −x < y < x, define f (x, y) = x2 − y 2 . Is f concave where defined?
Is f quasiconcave where defined?
Solution: Given f (x, y) = x2 − y 2 ,
(a) fx = 2x, fxx = 2, fy = −2y, fxy = 0, fyy = −2. The Hessian is
2 0
H=
.
0 −2
Since
2 0 = −4 < 0,
∆1 = 2 > 0 and ∆2 = 0 −2
H is not negative semidefinite so f is not concave.
(b) Bordering the Hessian with first partials, we obtain


0
2x −2y
H(x, y) =  2x
2
0 ,
−2y 0 −2
(−1)2+1 ∆2 = (−1)3 (−4x2 ) > 0 ∀x 6= 0,
(−1)2+2 ∆3 = (−1)4 8(x2 − y 2 ) > 0 as |y| < x.
Since (−1)k+1 |∆k | > 0, k = 2, 3, f is quasi-concave.
√
√
Example 4.5.4. Is the utility function u(x1 , x2 ) = x1 + x2 quasi-concave for x1 > 0, x2 > 0?
Is it concave for x1 > 0, x2 > 0?
√
√
Solution: Given U (x1 , x2 ) = x1 + x2 , the Hessian is given by
#
" −3/2
U11 U12
−x1 /4
0
=
H=
−3/2
U21 U22
0
−x2 /4
−3/2
−3/2 −3/2
For all (x1 , x2 ) ∈ R2++ , ∆1 = −x1 /4 < 0 and ∆2 = |H| = x1 x2 /16 > 0. Hence H is
negative definite, which implies that U (x1 , x2 ) is strictly concave on R2++ . Since concavity implies
quasi-concavity, U (x1 , x2 ) is also quasi-concave on R2++ .
Example 4.5.5. Is f (x, y) = x2 y 2 concave and/or quasi-concave on {(x, y) ∈ R2 |x ≥ 0, y ≥ 0}?
Is f concave and/or quasi-concave on R2 ?
Solution: The Hessian is given by
2
2y 4xy
H=
.
4xy 2x2
For all x, y ≥ 0, ∆1 = 2y 2 ≥ 0 for all x, y ≥ 0 and ∆2 = |H| = −12x2 y 2 ≤ 0. Thus, H is not
negative semi-definite, which implies that f is not concave on R2+ (and also not concave on R2 ).
Checking quasi-concavity:


0
2xy 2 2x2 y
H = 2xy 2 2y 2 4xy  .
2x2 y 4xy 2x2
70
A. W. Richter
4.5. QUASI-CONCAVITY, QUASI-CONVEXITY, AND DEFINITENESS
For all x, y > 0, (−1)2+1 ∆2 = 4x2 y 4 > 0 and (−1)3+1 ∆3 = 16x4 y 4 > 0. Thus, for (x, y) ∈
R2++ , the sufficient conditions for quasi-concavity hold since the inequalities are both strict. It
remains to check whether the function f is quasiconcave on R2+ . Since {(x, y) ∈ R2+ |f (x, y) ≥
0} = R2+ , all upper contour sets are convex, and f is quasi-concave on R2+ . Note f is not quasiconcave on R2 since
f ((1/2)(1, 1) + (1/2)(−1, −1)) = f (0, 0) = 0 min{f (1, 1), f (−1, −1)} = 1.
Example 4.5.6. Is f (x, y) = ln(x + y) quasi-concave on the set of strictly positive x and y values?
Solution: The Hessian is given by
#
"
1
1
− (x+y)
− (x+y)
2
2
.
H=
1
1
− (x+y)
− (x+y)
2
2
(1)
(2)
1
For all x, y > 0, B1 = B1 = − (x+y)
2 < 0 and ∆2 = |H| = 0. Thus, H is negative semidefinite
2
on R++ , implying that f is concave, and hence quasi-concave.
√
√
Example 4.5.7. Is f (x, y, z) = x + y + z 2 concave and/or quasi-concave on R3++ ?
Solution: The Hessian is given by

 1 −3/2
0
0
−4x

0
− 14 y −3/2 0 ,
0
0
2
which is not negative semi-definite (∆1 < 0, ∆2 > 0, and ∆3 > 0), so the function is not concave.
The bordered Hessian


1 −1/2
1 −1/2
2z
0
2x
2y
 1 x−1/2 − 1 x−3/2
0
0
4
2

1 −3/2
 1 y −1/2

0
−
y
0
2
4
2z
0
0
2
has determinant (2z 2 x3 − x5/2 − x3/2 y)/8, which is positive for some (x, y, z) ∈ R3++ and negative
for others. Thus the function is not quasi-concave on R3++ .
Example 4.5.8. Is f (x, y) = x2 y concave and/or quasi-concave on R2++ ? Explain.
Solution: The gradient and Hessian are respectively given by
2y 2x
2xy
.
H
=
∇f =
2x 0
x2
Thus, f is not negative semidefinite since ∆2 =
Hessian is given by

0
H 2xy
x2
−4x2 < 0. Hence f is not concave. The bordered

2xy x2
2y 2x
2x 0
Thus, (−1)3 ∆2 = 4x2 y 2 > 0 for all (x, y) ∈ R2++ and (−1)4 ∆3 = 6x4 y > 0 for all (x, y) ∈ R2++ .
Thus f is quasi-concave.
71
Chapter 5
Optimization
5.1 Unconstrained Optimization
Theorem 5.1.1. If f : Rn → R and all its first partial derivatives are continuously differentiable
on a set which contains input vector a in its interior, then
(a) (Necessary Condition) f has a local maximum (minimum) at x∗ only if ∇f (x∗ ) = 0 and
H(f (x∗ )) is negative (positive) semi-definite.
(b) (Sufficient Condition) f has a strict local maximum (minimum) at x∗ if ∇f (x∗ ) = 0 and
H(f (x∗ )) is negative (positive) definite.
(c) If H(f (x∗ )) is indefinite, then x∗ is neither a local maximum nor a local minimum.
Remark 5.1.1. Note that in the univariate case ∇f = 0 is replaced by f ′ = 0 and H(f ) NSD
(PSD, ND, and PD, respectively) is replaced by f ′′ ≤ (≥, <, >) 0.
Example 5.1.1. Let f (x, y) = −3x2 + xy − 2x + y − y 2 + 1. Then
−6x + y − 2
−6 1
∇f =
,
H(f ) =
,
x + 1 − 2y
1 −2
−6 1 = 11 > 0,
∆1 = −6 < 0, and ∆2 = 1 −2
so H(f ) is negative definite.
so ∇f (x, y) = 0 if
or
−6 1
x
−2
∇f =
+
1 −2 y
1
−6 1
x
−2
+
=0
1 −2 y
1
−1 1 −2 −1
2
−3/11
x
−6 1
2
=
.
=
=
4/11
y
1 −2
−1
11 −1 −6 −1
Thus, f has a strict local maximum at (x, y) = (−3/11, 4/11).
Definition 5.1.1 (Saddle Point). A critical point x∗ of f for which H(f (x∗ )) is indefinite is called
a saddle point of f . A saddle point is a minimum of f in some directions and a maximum of f in
other directions.
72
A. W. Richter
5.1. UNCONSTRAINED OPTIMIZATION
Example 5.1.2. Let f (x, y) = x3 + x2 y + 2y 2 . Then
2
3x + 2xy
6x + 2y 2x
∇f =
;
H(f
)
=
.
x2 + 4y
2x
4
Here H(f ) depends on the (x, y) at which it is evaluated. ∇f (x, y) = 0 if 3x2 + 2xy = 0 and
x2 + 4y = 0. From the last equation y = −x2 /4. Substituting this value into the previous equation,
(x2 /2)(6 − x) = 0. Thus, (x, y) = (0, 0) or (x, y) = (6, −9).
0 0
H[f (0, 0)] =
,
0 4
(1)
(2)
which is PSD (B1 = 0 = 0, B1 = 4, and B2 = 0).
18 12
H[f (6, −9)] =
12 4
and its leading principle minors are
∆1 = 18 > 0 and
18 12
= −72 < 0.
∆2 = 12 4 Thus, since the Hessian is indefinite at (6, −9), this point is a saddle point for the function. The
point (0, 0) satisfies the necessary conditions for a local minimum, but not the sufficient conditions.
However, f (x, 0) = x3 so f cannot attain either a local maximum or a local minimum at (0, 0).
This function has no local maxima or local minima.
Theorem 5.1.2 (Global Maxima/Minima). Let f : X → R be a C 2 function whose domain is a
convex open subset X of Rn . If f is a concave (convex) function on X and ∇f (x∗ ) = 0 for some
x∗ ∈ U , then x∗ is a global maximum (minimum) of f on X.
Remark 5.1.2. There is an interesting comparison between Theorems 5.1.1 and 5.1.2. In order to
guarantee that a critical point x∗ of a C 2 function f is a local maximum (minimum), we need to
show that H(f (x∗ )) is negative (positive) definite; showing that H(f (x∗ )) is negative (positive)
semi-definite is not strong enough. However, if we can show that H(f (y)) is negative (positive)
semi-definite not just at x∗ but for all y in a neighborhood about x∗ , then by Theorems 5.1.2, we
can conclude that x∗ is a maximum (minimum) of f .
Remark 5.1.3. A global maximum (or minimum) does not necessarily have to be a strict maximum
(or minimum). Moreover a strict maximum (or minimum) does not necessarily have to be a global
maximum (or minimum). To see this consider figure 5.1.
Example 5.1.3. Find all local maxima and minima of f (x, y, z) = x2 + x(z − 2) + 3(y − 1)2 + z 2
Solution: The associated first order conditions are
∂f
set
= 2x + z − 2 = 0
∂x
∂f
set
= 6(y − 1) = 0
∂y
∂f
set
= x + 2z = 0
∂z
73
A. W. Richter
5.2. CONSTRAINED OPTIMIZATION I: EQUALITY CONSTRAINTS
Figure 5.1: Strict and/or Global Extrema
(a) Global not Strict
(b) Global and Strict
(c) Strict not Global
solve the above equations to get (x, y, z) = ( 34 , 1, − 32 ). The Hessian matrix is


2 0 1
H = 0 6 0
1 0 2
and the corresponding leading principle minors are
∆1 = 2 > 0
2 0
= 12 > 0
∆2 = 0 6
∆3 = |H| = 18 > 0
Thus, H is PD, and therefore f is strictly convex. From the previous theorem, we can conclude that
the point (x, y, z) = ( 43 , 1, − 32 ) is the unique global minimizer (and a strict local minimizer).
5.2 Constrained Optimization I: Equality Constraints
Consider the problem of maximizing a function f (x) = f (x1 , x2 , . . . , xn ) of n variables constrained by m < n equality constraints. Let the functions g1 (x), g2 (x), . . . , gm (x) define the constraint set. Thus, our problem is to
maximize or minimize f (x1 , x2 , . . . , xn )
subject to g1 (x1 , x2 , . . . , xn ) = 0
g2 (x1 , x2 , . . . , xn ) = 0
..
.
gm (x1 , x2 , . . . , xn ) = 0
or, equivalently, in a more compact form
maximize or minimize f (x)
subject to g(x) = 0,
(5.1)
where g(x) = 0 denotes an m × 1 vector of constraints, m < n. The condition m < n is needed
to ensure a proper degree of freedom. Without this condition, there would not be a way for the
variables to adjust toward the optimum.
74
A. W. Richter
5.2. CONSTRAINED OPTIMIZATION I: EQUALITY CONSTRAINTS
The most intuitive solution method for problem (5.1) involves the elimination of m variables
from the problem by use of the constraint equations, thereby converting the problem into an equivalent unconstrained optimization problem. The actual solution of the constraint equations for m
variables in terms of the remaining n − m can often prove a difficult, if not impossible, task. Moreover, the elimination of variables is seldom applicable to economic problems, as economic theory
rarely allows for the specification of particular functional forms. Nevertheless, the theory underlying the method of elimination of variables can be used to obtain analytically useful characterizations
of solutions to equality constrained problems.
Alternatively, the solution can be obtained using the Lagrangian function defined as
L(x; λ) = f (x) − λ1 g1 (x) − λ2 g2 (x) − · · · − λm gm (x)
m
X
λi gi (x),
= f (x) −
(5.2)
i=1
where λ1 , λ2 . . . , λm multiply the constraints and are know as Lagrange multipliers. In order to
solve the problem, we find the critical points of the Lagrangian by solving the equations
∂L
∂L
∂L
= 0;
= 0; · · · ;
= 0;
∂x1
∂x2
∂xn
∂L
∂L
∂L
= 0;
= 0; · · · ;
= 0,
∂λ1
∂λ2
∂λm
(5.3)
which represent n + m equations for the n + m variables x1 , x2 , . . . , xn , λ1 , λ2 . . . , λm . Thus, we
have transformed what was a constrained problem of n variables into an unconstrained problem
∂L
, for i =
of n + m variables. Note that since λ1 , λ2 . . . , λm simply multiply the constraints, ∂λ
i
1, 2, . . . , m, is equivalent to each multipliers’ respective constraint. Thus, the system of equations,
(5.3), can be written more compactly as
∇f (x) = λ∇g(x),
(5.4)
g(x) = 0
where λ is a 1 × m vector of Lagrange multipliers and ∇g(x) is an m × n Jacobian matrix of the
constraint set.
To understand this more clearly, consider the problem of maximizing f (x1 , x2 ) subject to
g(x1 , x2 ) = 0. Geometrically, our goal is to find the highest valued level-curve of f , which meets
the constraint set C (see figure 5.2). The highest level-curve of f cannot cross the constraint curve
C (see point x′ ); if it did, nearby higher level sets would also cross (see point x′′ ). Thus, the highest
level set of f must be tangent to C at the constrained max, x∗ .
The gradient vector of the objective function and constraint set is given by
" ∂g #
" ∂f #
∇f (x) =
∂x1
∂f
∂x2
and
∇g(x) =
∂x1
∂g
∂x2
,
which are perpendicular to the level sets of f and g. Since the level sets of f and g have the same
slope at x∗ , the gradient vectors ∇f (x) and ∇g(x) must line up at x∗ . Thus they point in the same
direction or opposite directions (see figure 5.2). In either case, the gradients are scalar multiples
of each other. If the corresponding Lagrange multiplier is λ∗ , then ∇f (x∗ ) = λ∗ ∇g(x∗ ) as the
Lagrange formulation, given in (5.4), suggests.
75
A. W. Richter
5.2. CONSTRAINED OPTIMIZATION I: EQUALITY CONSTRAINTS
Figure 5.2: Constrained Optimization and the Lagrange Principle
x′
x′′
∇f (x∗ )
x∗
∗
x
∇g(x∗ )
x∗
∇f (x∗ )
∇g(x∗ )
C
C
C
Remark 5.2.1. In order for this transformation to remain valid, we must place a restriction on the
constraint set known as constraint qualification, which requires:
(a) ∇g(x∗ ) 6= 0 if the problem defined in (5.1) has only one constraint; and
(b) the rank of the Jacobian matrix
  ∂g1 (x∗ )
∇g1 (x∗ )
∂x1
∂g2 (x∗ )
 ∇g2 (x∗ )  


  ∂x1
J(x∗ ) = 
= .
..

  ..
.

∇gm (x∗ )
∂gm (x∗ )
∂x1
···
···
..
.
∂g1 (x∗ )
∂xn
∂g2 (x∗ )
∂xn
···
∂gm (x∗ )
∂xn
..
.






equals m (full rank) if the problem defined in (5.1) has m constraints, m > 1.
Example 5.2.1. This example illustrates why the rank m condition is required for the transformation
given in (5.2). Suppose our problem is to
maximize f (x1 , x2 , x3 ) = x1
subject to g1 (x1 , x2 , x3 ) = (x1 − 1)2 − x3 = −1
g2 (x1 , x2 , x3 ) = (x1 − 1)2 + x3 = 1.
Then the gradient vectors are
 


1
2(x1 − 1)
,
∇f = 0 , ∇g1 = 
0
0
−1
and


2(x1 − 1)
.
∇g1 = 
0
1
The set of points satisfying both constraints is {(1, y, 1)|y ∈ R}. If the transformation in (5.2) is
valid, (5.4) implies
(1, 0, 0) = λ1 (0, 0, −1) + λ2 (0, 0, 1),
which is not possible since the gradient vectors are linearly dependent. The problem here is that the
transformation is not valid, since constraint qualification is not satisfied (rank J(x∗ ) = 1 < m).
76
A. W. Richter
5.2. CONSTRAINED OPTIMIZATION I: EQUALITY CONSTRAINTS
Remark 5.2.2. Constraint qualification says that for transformation (5.2) to be valid, no point satisfying the constraint set can be a critical point of the constraint set. This means that if the constraint
set is linear, constraint qualification will automatically be satisfied.
Theorem 5.2.1 (Necessary and Sufficient Conditions for an Extremum). Let f, g1 , g2 , . . . , gk be C 2
real-valued functions on Rn . Consider the problem of maximizing (minimizing) f on the constraint
set g(x) = 0, where g(x) = 0 denotes an m × 1 vector of constraints, m < n. Then
(a) (necessary condition)
∇L(x∗ , λ∗ ) = 0
(b) (Sufficient Condition) If there exist vectors x∗ ∈ Rn , λ∗ = (λ∗1 , λ∗2 , . . . , λ∗m ) ∈ Rm such that
∇L(x∗ , λ∗ ) = 0
and for every non-zero vector z ∈ Rn satisfying
∇gi (x∗ ) · z = 0,
i = 1, 2, . . . , m
it follows that
zT ∇2x L(x∗ , λ∗ )z < (>)0,
(Hessian is negative (positive) definite)
then f has a strict local maximum (minimum) at x∗
Conveniently, these conditions for a maximum or minimum can be stated in terms of the Hessian
of the Lagrangian function, which turns out to be a bordered Hessian. The following rules work with
the bordered Hessian of a constrained optimization problem of the form:


..
L
· · · Lλ1 λm . Lλ1 x1 · · · Lλ1 xn
 λ.1 λ1
..
..
..
.. 
..
..
 .

.
.
.
.
.
.
. 

 


..
..


T
.
B
L
·
·
·
L
0
.
L
·
·
·
L

λm λm
λm x1
λm xn 
  λm λ1



H=
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . = 


.
..
L

B .. A
·
·
·
L
.
L
·
·
·
L
 x1 λ1
x1 x1
x1 xn 
x1 λm
 .
..
..
..
.. 
..
..
 ..
.
.
.
.
.
. 


..
Lxnλ1 · · · Lxn λm . Lxn x1 · · · Lxn xn


..
1
1
0 ···
0 . gx1
···
gxn
 .
.. ..
..
.. 
..
..

 .
.
.
.
.
.
.
. 



.


···
gxmn 
0 .. gxm1
 0 ···


=  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .



 g1 · · · gm ... L
·
·
·
L
 x1
x1 x1
x1 xn 
x1
 .
.. 
..
.. ..
..
..
 ..
.
.
. 
.
.
.


..
m
1
··· g
g
. L
··· L
xn
xn
xn x1
xn xn
77
A. W. Richter
5.2. CONSTRAINED OPTIMIZATION I: EQUALITY CONSTRAINTS
Criterion 5.2.1 (Sufficient Conditions for strict local Maximum with constraints). Let f and constraints g1 , g2 , . . . , gm be twice continuously differentiable real-valued functions. If there exist vectors x∗ ∈ Rn , λ∗ = (λ∗1 , λ∗2 , . . . , λ∗m ) ∈ Rm such that
∇L(x∗ , λ∗ ) = 0
and if H is negative definite on the constraint set, which is the case if (−1)k ∆m+k > 0 for k =
m + 1, ..., n, where m is the number of constraints that hold with equality, n is the number of
endogenous variables, and ∆k is the leading principle minor of order k (Note: Lagrange multipliers
do not count as endogenous variables), then f has a strict local maximum at x∗ .
Criterion 5.2.2 (Sufficient Conditions for strict local Minimum with constraints). Let f and constraints g1 , g2 , . . . , gm be twice continuously differentiable real-valued functions. If there exist vectors x∗ ∈ Rn , λ∗ = (λ∗1 , λ∗2 , . . . , λ∗m ) ∈ Rm such that
∇L(x∗ , λ∗ ) = 0
and if H is positive definite on the constraint set, which is the case if (−1)m ∆m+k > 0 for k =
m + 1, ..., n, then f has a strict local minimum at x∗ .
Remark 5.2.3. In short, we must check n − m leading principle minors starting with the principle
minor of highest order and working backwards. For example, if a problem contains 5 variables and
3 constraints, it will be necessary to check the signs of two principle minors: ∆7 and ∆8 .
Example 5.2.2 (Minimizing Cost subject to an Output Constraint). Consider a production function
given by
y = 20x1 − x21 + 15x2 − x22 .
Let the prices of x1 and x2 be 10 and 5 respectively with an output constraint of 55. Then to minimize the cost of producing 55 units of output given these prices, we set up the following Lagrangian
L(x1 , x2 , λ) = (10x1 + 5x2 ) − λ(20x1 − x21 + 15x2 − x22 − 55),
which has first order conditions
∂L
set
= 10 − λ(20 − 2x1 ) = 0
∂x1
∂L
set
= 5 − λ(15 − 2x2 ) = 0
∂x2
∂L
set
= 20x1 − x21 + 15x2 − x22 − 55 = 0.
∂λ
If we take the ratio of the first two first order conditions, we obtain
20 − 2x1
15 − 2x2
10 − 4x2 = −2x1
2=
→
→
30 − 4x2 = 20 − 2x1
→
x1 = 2x2 − 5.
Now plug this into the last first order condition to obtain
20(2x2 − 5) − (2x2 − 5)2 + 15x2 − x22 − 55 = 0.
78
A. W. Richter
5.2. CONSTRAINED OPTIMIZATION I: EQUALITY CONSTRAINTS
Multiplying out and solving for x2 will give
40x2 − 100 − 4x22 + 20x2 − 25 + 15x2 − x22 − 55 = 0
→ x22 − 15x2 + 36 = 0
→ (x2 − 12)(x2 − 3) = 0.
Therefore, we have two potential solutions (x1 , x2 ) = (19, 12) and (x1 , x2 ) = (1, 3). The Lagrange
multiplier λ is obtained by plugging the solutions into the above first order conditions to obtain
−10 + λ(20 − 2(19)) = 0
→
−10 + λ(20 − 2(1)) = 0
→
λ=−
5
9
5
λ= .
9
Given that ∇y(19, 12) = (−18, −9) 6= 0 and ∇y(1, 3) = (18, 9) 6= 0, constraint qualification
holds. To check for a maximum or minimum, we set up the bordered Hessian. Consider first the
point (19, 12, −5/9). The bordered Hessian in this case is
 ∂ 2 L(x∗ ,λ∗ ) ∂ 2 L(x∗ ,λ∗ ) ∂g(x∗ ) 


∂x1 ∂x2
∂x1
2λ
0
20 − 2x1
∂x21

 ∂ 2 L(x
∗ ,λ∗ )
∂g(x∗ ) 
∂ 2 L(x∗ ,λ∗ )
0
2λ
15 − 2x2 
H =
=
 ∂x2 ∂x1
∂x2 
∂x22
∗
∗
20 − 2x1 15 − 2x2
0
∂g(x )
∂g(x )
0
∂x1
∂x2
 10

−9
0
−18
−9  .
= 0
− 10
9
−18 −9
0
Since we have only two endogenous variables (n = 2) and one constraint (m = 1), it is sufficient
to check only the principle minor of highest magnitude (∆3 or, more precisely, the determinant of
the bordered Hessian) in order to determine definiteness.
0
10 − 10
−9
− 10
2
4
9
9
+ (−1) (−18) |H| = (−1) −
−18 −9 −9
0
9
10
= − (−81) + (−18)(−20)
9
= 450.
Since k = 2, (−1)2 ∆3 = (−1)2 (450) > 0. Therefore, H is negative definite on the constraint set,
and thus this point is a strict local maximum.
Now consider the other point, (1, 3, 5/9). The bordered Hessian is given by

  10

0 18
2λ
0
20 − 2x1
9

0
2λ
15 − 2x2  =  0 10
9 .
9
20 − 2x1 15 − 2x2
0
18 9 0
Again, it is sufficient to check only the determinant of the bordered Hessian in order to determine
definiteness. In this case, det H = −450 and (−1)∆3 = (−1)(−450) > 0. Therefore, H is
positive definite on the constraint set, and thus this point is a strict local minimum. The minimum
cost is obtained by substituting this point into the cost expression (objective function) to obtain
C = 10(1) + 5(3) = 25.
79
A. W. Richter
5.3. CONSTRAINED OPTIMIZATION II: NON-NEGATIVE VARIABLES
Example 5.2.3. Consider the problem of maximizing x2 y 2 z 2 subject to the constraint g(x, y, z) =
x2 + y 2 + z 2 = 3. The Lagrangian function is given by
L = x2 y 2 z 2 + λ(3 − x2 − y 2 − z 2 )
and the corresponding first order conditions are
∂L
set
= 2xy 2 z 2 − 2λx = 0
∂x
∂L
set
= 2x2 y 2 z − 2λz = 0
∂z
∂L
set
= 2x2 yz 2 − 2λy = 0
∂y
∂L
set
= x2 + y 2 + z 2 − 3 = 0
∂λ
with solution x2 = y 2 = z 2 = λ = 1. Since ∇g(±1, ±1, ±1) = (±2, ±2, ±2) 6= (0, 0, 0),
constraint qualification holds. At x = y = z = λ = 1, the bordered Hessian for this problem is


 
0
2x
2y
2z
0 2 2 2

2x 2y 2 z 2 − 2λ

4xyz 2
4xy 2 z 
 = 2 0 4 4 .
H=
2
2
2
2
2y
4xyz
2x z − 2λ
4x yz  2 4 0 4
2
2
2
2 4 4 0
2z
4xy z
4x yz
2x y 2 − 2λ
Since n = 3 and m = 1, we have to check the signs of the two leading principle minors of
highest order, ∆3 and ∆4 . After computation, we find ∆3 = 32 and ∆4 = −192. For k = 2,
(−1)2 ∆3 = (−1)2 (32) > 0 and for k = 3, (−1)3 ∆4 = (−1)3 (−192) > 0. Therefore, H
is negative definite on the constraint set, and thus this point is a strict local maximum. By the
properties of determinant, the remaining seven critical points also satisfy the sufficiency condition
conditions and are classified as local maxima. This is an example of a situation where the solution
is globally optimal, but not unique.
5.3 Constrained Optimization II: Non-negative Variables
To map our problem in section 5.2 into one that makes greater economic sense, consider the following problem, which postulates that the variables x1 , x2 , . . . , xj are subject to inequality constraints.
maximize or minimize f (x1 , x2 , . . . , xn )
subject to g1 (x1 , x2 , . . . , xn ) = 0
g2 (x1 , x2 , . . . , xn ) = 0
..
.
gm (x1 , x2 , . . . , xn ) = 0
x1 , x2 , . . . , xj ≥ 0 xj+1 , . . . , xn > 0.
If the optimum, x∗ , happens to be that these requirements are not binding, that is, x1 , x2 , . . . , xj
are in fact strictly positive, then the procedure outlined in the preceding section for determining
optimal points remains unaltered. That is, assigning a Lagrange multiplier λi , i = 1, 2, . . . , m to
each constraint, the Lagrangian function can once again be written as
L(x1 , . . . , xn , λ1 , . . . , λm ) = f (x1 , x2 , . . . , xn ) − λ1 g1 (x1 , . . . , xn ) − · · · − λm gm (x1 , . . . , xn )
m
X
λk gk (x1 , . . . , xn )
= f (x1 , x2 , . . . , xn ) −
k=1
80
A. W. Richter
5.3. CONSTRAINED OPTIMIZATION II: NON-NEGATIVE VARIABLES
and the first order necessary conditions satisfied at the optimum x∗ are
∂L
= 0,
∂xj
∂L
= 0,
∂λk
j = 1, 2, . . . , n
(5.5)
k = 1, 2, . . . , m,
(5.6)
so long as constraint qualification is satisfied. However, it is often the case that some components
are positive while others are zero. Thus, an equation like (5.5) should hold for the partial derivative
of the Lagrangian with respect to every component that is strictly positive and an inequality with
respect to every component that is zero. In other words for x1 , x2 , . . . , xn , we should have
∂L
≤ 0,
∂xj
xj ≥ 0,
(5.7)
with at least one of these holding with equality. The requirement that at least one inequality in (5.7)
should hold as an equality is sometimes stated more compactly as
xj
∂L
= 0.
∂xj
The point is that the product is zero only if at least one of the factors is zero. A pair of inequalities
like (5.7), not both of which can be strict, is said to show Complementary Slackness, which we will
denote “CS”. A single inequality, say xj ≥ 0, is binding if it holds as an equality, that is, if xj is at
the extreme limit of its permitted range; the inequality is said to be slack if xj is positive, meaning it
has some room to maneuver before hitting its extreme. Each one of the pair of inequalities in (5.7)
therefore complements the slackness of the other; if one is slack the other is binding.
The intuition is as follows: if x∗j > 0, the constraint is not binding and it is possible to adjust xj
until the marginal benefit of further adjustments is zero (∂L/∂xj = 0), given the other constraints.
If, on the other hand, x∗j = 0, then the constraint is binding and the marginal benefit of increasing
xj is zero. In this case, it is possible that the objective value could be improved if it negative xj
values were permitted (∂L/∂xj ≤ 0).
Example 5.3.1. Consider the following problem:
Maximize f (x, y) = 1 − 8x + 10y − 2x2 − 3y 2 + 4xy
subject to x ≥ 0 and y ≥ 0.
Begin by calculating the gradient and Hessian:
−8 − 4x + 4y
∇f (x, y) =
,
10 − 6y + 4x
H(f ) =
−4 4
.
4 −6
For all (x, y), the Hessian matrix is negative definite, since ∆1 = −4 < 0 and ∆2 = 8 > 0. Hence
f is a concave function, and any (x, y) that satisfies complementary slackness is a constrained global
maximum. It remains to locate such a point.
Our first pass at the problem is to look for a solution with x > 0, y > 0 so that ∇f (x, y) = 0.
Equating the gradient to the zero-vector gives
x − y = −2 and
81
2x − 3y = −5.
A. W. Richter
5.4. CONSTRAINED OPTIMIZATION III: INEQUALITY CONSTRAINTS
These equations are satisfied only when x = −1 and y = 1, which obviously violates the fact that
x > 0. Thus our problem is not yet solved. However, the calculation just made was not a wasted
effort, for we have in fact found the unconstrained maximum. And, since this has x < 0, it is likely
that x = 0 at the constrained maximum. We therefore look for a solution with x = 0 and y > 0, so
that ∂f /∂y = 0. Equating x and ∂f /∂y to zero we see that 10 − 6y = 0, so y = 5/3. Thus the
point (0, 35 ) satisfies the conditions
x = 0,
y > 0,
∂f /∂y = 0,
and it remains to show that this point satisfies the remaining condition for a constrained maximum,
namely ∂f /∂x < 0. At (0, 53 ),
∂f /∂x = −4(2 + 0 − 5/3) = −4/3 < 0,
so the condition is satisfied. In this case n = 2 and m = 2, which implies that there are no second
order conditions to check. Thus the constrained maximum is attained where x = 0 and y = 5/3;
the constrained maximum value of f is therefore 28/3. This is of course less than the value taken
by f at the unconstrained maximum (−1, 1), which is in fact 10.
Example 5.3.2. Define the profit function as
Π(x, y) =
1
(−80 + 24x + 78y − 3x2 − 3xy − 4y 2 ),
10
where x and y are the quantities of two different goods. We wish to choose x and y so as to
maximize Π(x, y) subject to the constraints x ≥ 0 and y ≥ 0.
1
1
In this case, ∂Π/∂x = 10
(24 − 6x − 3y) and ∂Π/∂y = 10
(78 − 3x − 8y). The Hessian Matrix
3
3
− 5 − 10
H=
3
− 45
− 10
has principle minors ∆1 < 0 and ∆2 > 0. Therefore Π(x, y) is negative definite. Hence the profit
function is concave, and the first order conditions give a global constrained maximum. It is not
hard to see that the only values of x and y for which ∂Π/∂x = ∂Π/∂y = 0 are x = −14/13 and
y = 132/13, which clearly violates the constraints. We therefore look for a solution (x, y) such that
x = 0, y > 0, ∂Π/∂x ≤ 0, and ∂Π/∂y = 0. The first, second, and fourth conditions are satisfied
where x = 0 and y = 9.75. Since 9.75 > 8, the third condition is also satisfied. In this case n = 2
and m = 2, which implies that there are no second order conditions to check. Hence the solution is
x = 0, y = 9.75 and the maximal profit is 30.025.
5.4 Constrained Optimization III: Inequality Constraints
To find the constrained maximum or minimum of a function, we simply constructed the Lagrangian,
set its (m+n) first order conditions equal to zero, and then solved these (m+n) equations in (m+n)
unknowns. However, the vast majority of constrained optimization problems that arise in economics
have their constraints defined by inequalities:
g1 (x1 , x2 , . . . , xn ) ≤ 0, g2 (x1 , x2 , . . . , xn ) ≤ 0, . . . , gm (x1 , x2 , . . . , xn ) ≤ 0.
Unfortunately, the method for finding the constrained maxima in problems with inequality constraints is a bit more complex than the method we used for equality constraints. The first order
82
A. W. Richter
5.4. CONSTRAINED OPTIMIZATION III: INEQUALITY CONSTRAINTS
conditions involve both equalities and inequalities and their solution eentaiils the investigations of
a number of cases. To see this more clearly, consider the following problem:
maximize f (x1 , x2 , . . . , xn )
subject to g1 (x1 , x2 , . . . , xn ) ≤ 0
g2 (x1 , x2 , . . . , xn ) ≤ 0
..
.
(5.8)
gm (x1 , x2 , . . . , xn ) ≤ 0
x1 , x2 , . . . , xn ≥ 0.
As an alternative to the procedure outlined in the previous section, we could introduce n new
constraints in addition to the m original ones:
gm+1 (x) = −x1 ≤ 0,
...,
gm+n (x) = −xn ≤ 0.
(5.9)
Then, if we introduce Lagrange multipliers λ1 , . . . , λm that are associated with the constraints and
µ1 , . . . , µn to go with the non-negativity constraints, our Lagrangian function is of the form
L(x, λ, µ) = f (x) −
m
X
j=1
λj gj (x) −
n
X
µi (−xi ),
(5.10)
i=1
where λ = {λ1 , . . . , λm } and µ = {µ1 , . . . , µn }. The necessary conditions for x∗ to solve this
problem are
m
∂f (x∗ ) X ∂gj (x∗ )
λj
−
+ µi = 0,
∂xi
∂xi
i = 1, . . . , n
(i)
j = 1, . . . , m
(ii)
i = 1, . . . , n.
(iii)
j=1
gj (x∗ ) ≤ 0, λj ≥ 0 (λj = 0 if gj (x∗ ) < 0),
xi ≥ 0, µi ≥ 0 (µi = 0 if xi > 0),
To reduce this collection of m + n constraints and m + n Lagrange multipliers, the necessary
conditions for problem (5.8) are often formatted slightly differently. In fact, it follows from (i) that
Pm
∂gj (x∗ )
∂f (x∗ )
= −µi . Since µi ≥ 0 and −µi = 0 if xi > 0, we see that (i) and (iii)
j=1 λj ∂xi
∂xi −
together are equivalent to the condition
m
∂f (x∗ ) X ∂gj (x∗ )
λj
−
≤ 0 (= 0 if x∗i > 0),
∂xi
∂xi
i = 1, . . . , n.
j=1
With the possibility of inequality constraints, there are now two kinds of possible solutions: one
where the constrained optimum lies in the region where gj (x) < 0, in which case constraint j
is slack, and one where the constrained optimum lies on the boundary gj (x) = 0, in which case
constraint j is binding. In the former case, the function gj (x) plays no role. x∗ still corresponds to
the optimum of the Lagrangian given in (5.10), but this time with λj = 0. The latter case, where the
optimum lies on the boundary of each constraint, is analogous to the equality constraint discussed
previously and corresponds to the optimum of the Lagrangian with λj 6= 0 for all j. In this case,
however, the sign of the Lagrange multiplier is crucial, because the objective function f (x) will
only be at a maximum if its gradient is oriented away from the region g(x) < 0 (i.e. ∇g(x) and
∇f (x) point in the same direction). We therefore have ∇f (x) = λ∇g(x) for λj ≥ 0 for all j (if
the constraint was written as g(x) ≥ 0, the gradient vectors would point in opposite directions and
∇f (x) = −λ∇g(x) for λj ≥ 0 for all j).
83
A. W. Richter
5.4. CONSTRAINED OPTIMIZATION III: INEQUALITY CONSTRAINTS
Remark 5.4.1 (Constraint Qualification). In order for the transformation in (5.10) to be valid, the
gradient vectors ∇gj (x∗ ) (j = 1, . . . , m) corresponding to those constraints that are binding at x∗
must be linearly independent. In other words, the corresponding Jacobian matrix must be full rank.
Remark 5.4.2. When solving optimizations problems subject to inequality constraints, it is best to
map the problem into the standard form given in (5.8). If the problem is one of minimizing f (x), the
equivalent problem of maximizing −f (x) should be solved. Also, all inequality constraints should
be written as gj (x) ≤ 0 (i.e. if the original constraint was rj (x) ≤ bj , then gj (x) = rj (x) − bj ,
while if the original was rj (x) ≥ bj , then gj (x) = bj − rj (x)).
Theorem 5.4.1 (Kuhn-Tucker Necessary Conditions). Suppose that x∗ = (x∗1 , . . . , x∗n ) solves
(5.8). Suppose further that the constraint qualification is satisfied. The there exist unique numbers λ∗1 , . . . , λ∗m such that
(a)
∂f (x∗ )
∂xi
−
Pm
j=1 λj
∂gj (x∗ )
∂xi
≤ 0 (= 0 if x∗j > 0),
(b) gj (x∗ ) ≤ 0, λj ≥ 0 (= 0 if gj (x∗ ) < 0),
i = 1, . . . , n
j = 1, . . . , m.
Theorem 5.4.2 (Kuhn-Tucker Sufficient Conditions). Consider problem (5.8) and suppose that x∗
and λ∗1 , . . . λ∗m satisfy conditions (a) and (b) in theorem (5.4.1). If the Lagrangian L = f (x∗ ) −
P
m
∗
∗
∗
j=1 λj gj (x ) is concave, then x is optimal.
Note that in this formulation of the necessary/sufficient conditions we use the ordinary Lagrangian, not the extended Lagrangian used earlier. The exhaustive procedure for finding a solution
using this theorem involves searching among all 2m+n patterns that are possible from the (m + n)
complementary slackness conditions. Fortunately, short-cuts are usually available.
Theorem 5.4.3 (Conditions for Globality).
(a) If the feasible set is compact and the objective function is continuous, then the best of the
local solutions is the global solution.
(b) If the feasible set is convex and the objective function is concave, then any point satisfying
the first-order conditions is a global maximizer. If the feasible set is convex and the objective
function is strictly concave, then any point satisfying the first-order conditions is the unique
global maximizer. (Similar conclusions hold for convex objective functions and minimizers.)
(c) If the feasible set is convex and the objective function is quasi-concave, then any point satisfying the first-order conditions [with ∇f 6= 0] is a global maximizer. If, in addition, the feasible
set is strictly convex or the objective function is strictly quasi-concave, then any point satisfying the first-order conditions (with ∇f 6= 0) is the unique global maximizer. [To see why
we need ∇f 6= 0, consider the problem of maximizing f (x, y) = xy subject to x ≥ 0, y ≥ 0,
and x + y ≤ 2. The feasible set is convex and f is quasi-concave on R2+ . The first-order
conditions hold at (0,0) with ∇f (0, 0) = 0, but (0,0) is clearly not even a local maximizer.]
Example 5.4.1. Now let us apply the above rules to the following maximization problem
max xy
subject to x + y ≥ −1
x+y ≤2
84
A. W. Richter
5.4. CONSTRAINED OPTIMIZATION III: INEQUALITY CONSTRAINTS
The associated Lagrangian function is
L(x, y, λ, µ) = xy + λ(x + y + 1) + µ(2 − x − y).
and the first order conditions are as follows:
∂L
∂x
∂L
∂y
∂L
∂λ
∂L
∂µ
set
= y+λ−µ = 0
set
=x+λ−µ = 0
=x+y+1≥0
λ≥0
with “CS”
=2−x−y ≥0
µ≥0
with “CS”
The bordered Hessian is

Lλλ
Lµλ
H=
Lxλ
Lyλ
Lλµ
Lµµ
Lxµ
Lyµ
Lλx
Lµx
Lxx
Lyx
 

Lλy
0 0
1
1


Lµy 
 = 0 0 −1 −1 .
Lxy  1 −1 0
1
Lyy
1 −1 1
0
Then we have these solutions for the following cases:
Case 1: λ = 0 = µ results in s∗1 = (0, 0, 0, 0). Thus, we no longer have any binding constraints,
that is, they drop out of the Lagrangian. Therefore, m = 0 and n = 2 and we must check
the two leading principle minors of highest magnitude. With the constraint dropping out, the
bordered Hessian becomes
0 1
H=
.
1 0
We then see that for k = 2, (−1)2 ∆2 = −1 < 0, which violates the rule for negative
definiteness. Thus, this point is not a local maximum.
Case 2: λ = 0, µ > 0 results in s∗2 = (1, 1, 0, 1). Then we have one binding constraint, so that
m = 1 and n = 2. Thus, we must check only the last leading principle minor. The bordered
Hessian is


0 −1 −1
1 .
H = −1 0
−1 1
0
Then the rule is (−1)2 ∆3 = det H = 2 > 0. This fulfills the rule, hence this point is a local
maximum.
Case 3: λ > 0, µ = 0 results in s3 ∗ = (−1/2, −1/2, 1/2, 0). Again, we have one binding constraint, so that m = 1 and n = 2, and we must check only the last leading principle minor.
The bordered Hessian is


0 1 1
H = 1 0 1 .
1 1 0
Then the rule is (−1)2 ∆3 = det H = 2 > 0. Again, this fulfills the rule, hence this point is a
local maximum.
85
A. W. Richter
5.4. CONSTRAINED OPTIMIZATION III: INEQUALITY CONSTRAINTS
Case 4: λ > 0, µ > 0 results in no solution.
Comparing the output for f (1, 1) and f (−1/2, −1/2) and noting that the feasible set is convex and
the objective function is quasi-concave implies that (1, 1, 0, 1) is the unique global maximum.
Example 5.4.2. Consider the following problem:
min x2 + 2y 2 + 3z 2 ,
subject to 3x + 2y + z ≥ 17.
The Lagrangian function is (note the negative sign, so we in fact minimize once we find the maximizer of the negative of the objective function):
L = −(x2 + 2y 2 + 3z 2 ) + λ(3x + 2y + z − 17).
The first order conditions are:
∂L
∂x
∂L
∂y
∂L
∂z
∂L
∂λ
set
= −2x + 3λ = 0,
(i)
set
= −4y + 2λ = 0,
(ii)
set
= −6z + λ = 0,
= 3x + 2y + z − 17 ≥ 0,
(iii)
λ≥0
with “CS”.
(iv)
Case 1 (λ = 0): From (i)-(iii) x = 0, y = 0, z = 0, which violates (iv).
Case 2 (λ > 0): From (iv), 3x + 2y + z − 17 = 0. Solving (i)-(iii) along with this equation, results
in (x, y, z, λ) = (9/2, 3/2, 1/2, 3). Now check the second order conditions. The bordered
Hessian of the Lagrangian is

 

Lλλ Lλx Lλy Lλz
0 3
2
1
Lxλ Lxx Lxy Lxz  3 −2 0
0
=

H=
Lyλ Lyx Lyy Lyz  2 0 −4 0  .
Lzλ Lzx Lzy Lzz
1 0
0 −6
Then we know that n = 3 and m = 1, so that according to our rule we have to check whether
(−1)2 ∆1+2 > 0, for k = 2 and
(−1)3 ∆1+3 > 0, for k = 3.
For k = 2, we find that
and for k = 3 we have
0 3
2
2
(−1) 3 −2 0 = 44 > 0
2 0 −4

3 −2 0 0 3
2
3
3
(−1) det H = (−1) (−1) 2 0 −4 + (−6) 3 −2 0  = 272 > 0.
1 0
2 0 −4
0

Thus, we have found a local maximum. The original objective function is strictly convex, and
the feasible set is convex as well, so that (9/2, 3/2, 1/2, 3) is the unique global minimizer.
86
A. W. Richter
5.4. CONSTRAINED OPTIMIZATION III: INEQUALITY CONSTRAINTS
√
Example 5.4.3. Find all local maximizers for the function f (x, y) = − x + y subject to x ≥ 0
and x2 y ≥ 108. Then find all global maximizers for the problem or show none exist. Explain.
Solution: The Lagrangian function is
√
L = − x + y − λ(108 − x2 y)
and the associated first order conditions are as follows
Lx = −1/2(x + y)−1/2 + 2λxy ≤ 0,
−1/2
Ly = −1/2(x + y)
2
+ λx = 0
2
Lλ = x y − 108 ≥ 0,
x≥0
(with “CS”)
λ≥0
(with “CS”)
Both λ = 0 and x = 0 are inconsistent with Ly = 0. Thus, λ > 0 and x > 0 is the only possible
case, and the unique potential solution is (6, 3, 1/216). The associated bordered Hessian is

 

1/27 7/108 36
2λy + (x + y)−3/2 /4 2λx + (x + y)−3/2 /4 2xy
H = 2λx + (x + y)−3/2 /4
(x + y)−3/2 /4
x2  = 7/108 1/108 36 .
36
36
0
2xy
x2
0
n − m = 1, so we only have one determinant to check: (−1)2 det H = 108 > 0. Thus, (x∗ , y ∗ ) =
(6, 3) is a strict local maximizer and the only local maximizer. In order to assess globality, first
note that both x and y must be strictly positive to be feasible, so the feasible set turns out to be
the subset of R2++ , where g(x, y) = x2 y ≥ 108. Checking the bordered Hessian for g, we find
g is strictly quasi-concave on R2++ . Thus, the feasible set, an upper contour set for g, is convex.
√
The objective function is quasi-concave: for any c ≤ 0, the set of (x, y) such that − x + y ≥ c
is {(x, y) ∈ R2 |0 ≤ x + y ≤ c2 }, which is convex (for c > 0 the set is empty). With a quasiconcave objective function and a convex feasible set, the unique local maximizer is the unique
global maximizer.
87
Chapter 6
Comparative Statics
In many economic problems we need to know how an optimal solution or an equilibrium solution will change when a parameter in the problem changes. For example, how does the utilitymaximizing bundle for a competitive consumer change when a price changes, or how does a market
equilibrium price change when a tax on the good changes? These are examples of comparative
statics questions. In each case we are interested in how the endogenous variables (those determined
within the model) are affected by changes in the exogenous variables (the parameters determined
outside the model).
For the consumer choice problem, the endogenous variables are the quantities demanded (chosen by the consumer), while the exogenous variables are prices (outside the control of the competitive consumer). For the market example, the endogenous variable is the market equilibrium price
(determined by supply and demand in the market), while the exogenous variable is the tax rate (determined outside the market in some political process). In this section, you will find two extremely
helpful tools for evaluating comparative statics.
6.1 Cramer’s Rule
Cramer’s Rule provides a recipe for solving linear algebraic equations in terms of determinants. Let
the simultaneous equations be as usual denoted as
Ax = y
(6.1)
where A is a given n × n matrix, y is a given n × 1 vector of unknowns.
The explicit solutions of the components x1 , x2 , . . . , xn of x in terms of determinants are
a11 y1 a13 . . . a1n y1 a12 a13 . . . a1n a22 y2 a23 . . . a2n y2 a22 a23 . . . a2n ..
..
..
..
..
..
.. .. ..
..
.
.
.
.
.
.
.
. . .
an2 yn an3 . . . ann yn an2 an3 . . . ann ,
x2 =
,...
(6.2)
x1 =
|A|
|A|
Theorem 6.1.1 (Cramer’s Rule). Let A be a nonsingular matrix. Then the unique solution x =
(x1 , . . . , xn ) of the n × n system Ax = b is
xi =
det Bi
,
det A
for i = 1, . . . , n,
where Bi is the matrix A with the right-hand side b replacing the ith column of A.
88
A. W. Richter
6.2. IMPLICIT FUNCTION THEOREM
This method of solving simultaneous equations is known as Cramer’s Rule. Because the explicit
computation of determinants is extremely tedious for n > 3, this rule has practical value only for
n = 2 and n = 3 (it is marginal for n = 4).
Example 6.1.1. Solve the 3 × 3 linear system

   
5 2 1
x1
8
3 2 0 x2  = 5
1 0 2
x3
3
by Cramer’s rule:
8 2
5 2
3 0
x1 = 5 2
3 2
1 0
1
0
2
6
= = 1,
6
1
0
2
8 1
5 0
3 2
6
= = 1,
6
2 1
2 0
0 2
5
3
1
x2 = 5
3
1
Example 6.1.2. Solve the 2 × 2 linear algebraic system
5
x1
2 + β −β
=
0
−β 1 + β x2
5
3
1
x3 = 5
3
1
2 8
2 5
0 3
6
= = 1.
6
2 1
2 0
0 2
by Cramer’s rule:
2 + β 5
−β 0
5 −β 0 1 + β 5 + 5β
=
,
x1 = 2 + β −β 2 + 3β
−β 1 + β 5β
=
x2 = .
2 + β −β 2 + 3β
−β 1 + β 6.2 Implicit Function Theorem
To understand the Implicit Function Theorem (IFT), consider first the simplest case, one equation
with one endogenous and one exogenous variable of the form
f (x, y) = 0.
(6.3)
Assuming that f is C 1 and (6.3) defines y as a differentiable function of x, implicit differentiation
yields
∂f
∂f
dx +
dy = 0.
∂x
∂y
If ∂f /∂y 6= 0, then
dy
=−
dx
∂f
∂x
∂f
∂y
.
Now, let’s carry this computation out more generally for the implicit function G(x, y) = c around
the specific point x = x0 , y = y0 . We suppose that there is a C 1 solution y = y(x) to the equation
G(x, y) = c, that is,
G(x, y(x)) = c.
(6.4)
89
A. W. Richter
6.2. IMPLICIT FUNCTION THEOREM
We will use the Chain Rule (section 1.2.5) to differentiate (6.4) with respect to x at x0 :
∂G
dx ∂G
dy
(x0 , y(x0 )) ·
+
(x0 , y(x0 )) ·
(x0 ) = 0,
∂x
dx
∂y
dx
∂G
∂G
(x0 , y0 ) +
(x0 , y0 ) · y ′ (x0 ) = 0.
∂x
∂y
Solving for y ′ (x0 ) yields
y ′ (x0 ) =
∂G
dy(x0 )
∂x (x0 , y0 )
.
= − ∂G
dx
∂y (x0 , y0 )
Theorem 6.2.1 (Implicit Function Theorem-One Exogenous Variable). Let G(x, y) be a C 1 function on an ε-neighborhood about (x0 , y0 ) in R2 . Suppose that G(x0 , y0 ) = c and consider the
expression
G(x, y) = c.
If (∂G/∂y)(x0 , y0 ) 6= 0, then there exists a C 1 function y = y(x) defined on an interval I about
the point x0 such that:
(a) G(x, y(x)) ≡ c for all x in I,
(b) y(x0 ) = y0 , and
∂G
(x ,y )
0 0
∂x
(c) y ′ (x0 ) = − ∂G
.
(x ,y )
∂y
0
0
Example 6.2.1. Show that the equation x2 ey − 2y + x = 0 defines y as a function of x in an interval
around the point (−1, 0). Find the derivative of this function at x = −1. (Observe that the equation
cannot be explicitly solved for y.)
Solution: Define f (x, y) = x2 ey − 2y + x. Then f1 (x, y) = 2xey + 1, f2 (x, y) = x2 ey − 2,
and f is C 1 everywhere. Furthermore, f (−1, 0) = 0 and f2 (−1, 0) = −1 6= 0. According to
Theorem 6.2.1, the equation therefore defines y as a C 1 function of x in an interval around (−1, 0).
Moreover, we have
f1 (x, y) 2xey + 1 ′
y (−1, 0) = −
=− 2 y
= −1.
f2 (x, y) x=−1
x e − 2 x=−1
y=0
y=0
Example 6.2.2. Consider the equation
f (x, y) ≡ x2 − 3xy + y 3 − 7 = 0
(6.5)
about the point (x0 , y0 ) = (4, 3). Notice that f (4, 3) satisfies equation (6.5). One computes that
∂f
= 2x − 3y
∂x
∂f
= −3x + 3y 2 .
∂x
Since (∂f /∂y)(4, 3) = 15 6= 0, Theorem (6.2.1) tells us that (6.5) does indeed define y as a C 1
function of x around x0 = 4, y0 = 3. Furthermore,
2x − 3y 1
′
y (4, 3) = − 2
=
3y − 3x x=4 15
y=3
90
A. W. Richter
6.2. IMPLICIT FUNCTION THEOREM
6.2.1 Several Exogenous Variables
Now consider a case where there exists one equation with one endogenous and several exogenous
variables of the form
G(x1 , x2 , . . . , xk , y) = c.
(6.6)
Around a given point (x∗1 , . . . , x∗k , y ∗ ), we want to vary x = (x1 , . . . , xk ) and then find a y-value
which corresponds to each such (x1 , . . . , xk ). In this case, we say that equation (6.6) defines y as a
implicit function of (x1 , . . . , xk ). Once again, given G and (x∗ , y ∗ ), we want to know whether this
functional relationship exists and, if it does, how does y change if any of the xi ’s change from x∗i .
Since we are working with a function of several variables (x1 , . . . , xk ), we will hold all but one of
the xi ’s constant and vary one exogenous variable at a time. However, this puts us right back in the
two-variable case that we have been discussing. The natural extension of Theorem (6.2.1) to this
setting is the following.
Theorem 6.2.2 (Implicit Function Theorem-Several Exogenous Variables). Let G(x1 , . . . , xk , y)
be a C 1 function around the point (x∗1 , . . . , x∗k , y ∗ ). Suppose further that (x∗1 , . . . , x∗k , y ∗ ) satisfies
G(x∗1 , . . . , x∗k , y ∗ ) = c
∂G ∗
(x , . . . , x∗k , y ∗ ) 6= 0.
and
∂y 1
Then there exists a C 1 function y = y(x1 , . . . , xk ) defined on an open neighborhood N about
(x∗1 , . . . , x∗k , y ∗ ) so that:
(a) G (x1 , . . . , xk , y(x1 , . . . , xk )) = c for all x1 , . . . , xk ∈ N ,
(b) y ∗ = y(x∗1 , . . . , x∗k ), and
(c) for each index i
∂G
∗ ∗
∗
∂y ∗
∂x (x1 , . . . , xk , y )
(x1 , . . . , x∗k ) = − ∂Gi ∗
.
∗ ∗
∂xi
∂y (x1 , . . . , xk , y )
(6.7)
6.2.2 The General Case
Consider the following nonlinear system of m equations and m + n unknowns defined as
F1 (y1 , . . . , ym , x1 , . . . , xn ) = c1
F2 (y1 , . . . , ym , x1 , . . . , xn ) = c2
..
..
.
.
(6.8)
Fm (y1 , . . . , ym , x1 , . . . , xn ) = cm ,
where we want y1 , . . . , ym to be endogenous and x1 , . . . , xn to be exogenous. Totally differentiating
(6.8) the above system of equations about the point (y∗ , x∗ ), we obtain
∂F1
∂F1
∂F1
∂F1
dy1 + · · · +
dym +
dx1 + · · · +
dxn
∂y1
∂ym
∂x1
∂xn
..
.
∂Fm
∂Fm
∂Fm
∂Fm
dy1 + · · · +
dym +
dx1 + · · · +
dxn
∂y1
∂ym
∂x1
∂xn
91
=0
..
.
= 0,
(6.9)
A. W. Richter
6.2. IMPLICIT FUNCTION THEOREM
where all the partial derivatives are evaluated at the point (y∗ , x∗ ). By the Implicit Function Theorem, the linear system (6.9) can be solved for dy1 , . . . , dym in terms of dx1 , . . . , dxn if and only if
the coefficient (Jacobian) matrix of the dyi ’s,
 ∂F1
∂F1 
∂ym
···
..
.
···
∂y1
∂(F1 , . . . , Fm )  .
≡  ..
∂(y1 , . . . , ym )
∂F
m
∂y1
.. 
. 
(6.10)
∂Fm
∂ym
is nonsingular at (y∗ , x∗ ). Since this system is linear, when the coefficient matrix (6.10) is nonsingular, we can use the inverse of (6.10) to solve the system (6.9) for the dyi ’s in terms of the dxj ’s
and everything else. Thus, in matrix notation we obtain
 ∂F1
···
..
.
···
∂y1
 ..
 .
∂Fm
∂y1

∂F1  
dy1
∂ym
 ∂F1
∂x1
..   ..  = −  ..
 .
.  . 
∂Fm
∂Fm
dym
∂ym
∂x1


dx1
..   ..  ,
.  . 
∂Fm
dxn
∂xn
∂F1
∂xn
···
..
.
···
which implies

 ∂F1
dy1
∂y1
 ..
 .. 
=
−
 .
 . 

∂Fm
∂y1
dym
···
..
.
···

∂F1 −1  Pn
∂F1
i=1 ∂xi dxi
∂ym
.. 
. 
..
.


Pn
∂Fm
∂ym
∂Fm
i=1 ∂xi dxi

.
(6.11)
Since the linear approximation (6.9) of the original system (6.8) is a true implicit function of
the dyi ’s in terms of the dxj ’s, the basic principle of calculus leads us to the conclusion that the
nonlinear system (6.8) defines the yi ’s as implicit functions of the xj ’s, at least in a neighborhood
of (y∗ , x∗ ).
Furthermore, one can actually use the linear solution (6.11) of the dyi ’s in terms of the dxj ’s to
find the derivatives of the yi ’s with respect to the xj ’s at (x∗ , y∗ ). To compute ∂yk /∂xh for some
fixed indices h and k, recall that this derivative estimates the effect on yk of a one unit increase in
xh (dxh = 1). So, we set all the dxj ’s equal to zero in (6.9) or (6.11) except dxh , and then we solve
(6.9) or (6.11) for the corresponding dyi ’s. If we use (6.11), we find
 dy1 
dxh
 ∂F1
∂y1
 .. 
 .
 .  = −  ..
dym
dxh
∂Fm
∂y1
···
..
.
···
∂F1 −1  ∂F1
∂ym
∂xh
.. 
. 
∂Fm
∂ym

 .. 
 . 
∂Fm
∂xh
(6.12)
Alternatively, we can apply Cramer’s rule to (6.9) and compute
 ∂F1
∂y1
dyk
=−
dxh

det  ...
∂Fm
∂y1
 ∂F1
∂y1

det  ...
∂Fm
∂y1
···
..
.
···
···
..
.
···
∂F1
∂xh
..
.
∂Fm
∂xh
∂F1
∂yk
..
.
∂Fm
∂yk
···
..
.
···
···
..
.
···
∂F1 
∂ym
.. 
. 
∂Fm
∂ym
∂F1
∂ym

.. 
. 
∂Fm
∂ym .
(6.13)
The following theorem—the most general form of the Implicit Function Theorem—summarizes
these conclusions.
92
A. W. Richter
6.2. IMPLICIT FUNCTION THEOREM
Theorem 6.2.3. Let F1 , . . . , Fm : Rm+n → R1 be C 1 functions. Consider the system of equations
F1 (y1 , . . . , ym , x1 , . . . , xn ) = c1
F2 (y1 , . . . , ym , x1 , . . . , xn ) = c2
..
..
.
.
(6.14)
Fm (y1 , . . . , ym , x1 , . . . , xn ) = cm
as possibly defining y1 , . . . , ym as implicit functions of x1 , . . . , xn . Suppose that (y∗ , x∗ ) is a
solution of (6.14). If the determinant of the m × m matrix
 ∂F1
∂y1
 ..
 .
∂Fm
∂y1
···
..
.
···
∂F1 
∂ym
.. 
. 
∂Fm
∂ym
evaluated at (y∗ , x∗ ) is nonzero, then there exist C 1 functions
y1 = f1 (x1 , . . . , xn )
..
..
.
.
ym = fm (x1 , . . . , xn )
defined on a neighborhood N about x∗ such that
F1 (f1 (x), . . . , fm (x), x1 , . . . , xn ) = c1
F2 (f1 (x), . . . , fm (x), x1 , . . . , xn ) = c1
..
..
.
.
Fm (f1 (x), . . . , fm (x), x1 , . . . , xn ) = c1
for all x = (x1 , . . . , xn ) in N and
y1∗ = f1 (x∗1 , . . . , x∗n )
..
..
.
.
∗
ym
= fm (x∗1 , . . . , x∗n ).
Furthermore, one can compute (∂fk /∂xh )(y∗ , x∗ ) = (∂yk /∂xh )(y∗ , x∗ ) by setting dxh = 1 and
dxj = 0 for j 6= h in (6.9) and solving the resulting system for dyk . This can be accomplished:
(a) by inverting the nonsingular matrix (6.10) to obtain the solution (6.12) or
(b) by applying Cramer’s rule to (6.9) to obtain the solution (6.13).
Example 6.2.3. Consider the system of equations
F1 (x, y, a) ≡ x2 + axy + y 2 − 1 = 0
F2 (x, y, a) ≡ x2 + y 2 − a2 + 3 = 0
93
(6.15)
A. W. Richter
6.2. IMPLICIT FUNCTION THEOREM
around the point x = 0, y = 1, a = 2. If we change a a little to a′ near a = 2, can we find an (x′ , y ′ )
near (0, 1) so that (x′ , y ′ , a′ ) satisfies these two equations? To answer this question, we need to find
the Jacobian of (F1 , F2 ) (the matrix of partial derivatives with respect to the endogenous variables
x and y) at the point x = 0, y = 1, a = 2
!
∂F1
∂F1
2 2
∂x
∂y
= 4 6= 0.
det ∂F2 ∂F2 (0, 1, 2) = det
0 2
∂x
∂y
Thus, we can solve system (6.15) for x and y as functions of a near (0, 1, 2). Furthermore, using
Cramer’s rule, we obtain
!
∂F1
∂F1
∂x
∂a
2x + ay xy
det ∂F2 ∂F2
det
2x
−2a
∂x
∂a
dy

 =−
.
=−
∂F1
∂F1
da
2x + ay ax + 2y
∂x
∂y
det
det  ∂F2 ∂F2 
2x
2y
∂x
∂y
Evaluating at x = 0, y = 1, a = 2, gives
2
det
0
dy
(0, 1, 2) = −
da
2
det
0
0
−4
8
= = 2 > 0.
4
2
2
Therefore, if a increases to 2.1, y will increase to 1.2. Let us now use the method of total differentiation to compute the effect on x. Total differentiating the non-linear system (6.15), we obtain
(2x + ay)dx + (ax + 2y)dy + xy da = 0
2x dx + 2y dy − 2a da = 0.
Evaluating at x = 0, y = 1, a = 2:
2 dx + 2 dy = 0 da
0 dx + 2 dy = 4 da.
Clearly, dy = 2 da (as we just computed above) and dx = −dy = −2da. Thus, if a increases to
2.1, x will decrease to −.2.
Example 6.2.4. Consider the following system
Y =C +I +G
C = C(Y − T )
(6.16)
I = I(r)
M s = M (Y, r),
where the nonlinear functions x 7→ C(x), r 7→ I(r), and (Y, r) 7→ M (Y, r) satisfy
0 < C ′ (x) < 1,
I ′ (r) < 0,
94
∂M
> 0,
∂Y
∂M
< 0.
∂r
(6.17)
A. W. Richter
6.2. IMPLICIT FUNCTION THEOREM
System (6.16) can be reduced to
Y − C(Y − T ) − I(r) = G
M (Y, r) = M s ,
where we have defined Y and r as implicit functions of G, M s , and T . Suppose that the current
(G, M s , T ) is (G∗ , M s∗ , T ∗ ) and that the corresponding (Y, r)-equilibrium is (Y ∗ , r ∗ ). If we vary
(G, M s , T ) a little, is there a corresponding equilibrium (Y, r) and how does it change? Totally
differentiating system (6.16), we obtain
∂C
∂C
∂I
1−
dr = dG −
dT
dY −
∂Y
∂r
∂T
∂M
∂M
dY +
dr = dM s
∂Y
∂r
or, in matrix notation,
∂C
∂Y
∂M
∂Y
1−
− ∂I
∂r
∂M
∂r
!
dY
dr
=
dG − ∂C
∂T dT
dM s
(6.18)
all evaluated at (Y ∗ , r ∗ , G∗ , M s∗ , T ∗ ). The determinant of the coefficient matrix in (6.18),
∂C ∂M
∂I ∂M
D ≡ 1−
+
∂Y
∂r
∂r ∂Y
is negative by (6.17) and therefore nonzero. By Theorem 6.2.3, the system (6.16) does indeed define
Y and r as implicit functions of G, M s , and T around (Y ∗ , r ∗ , G∗ , M s∗ , T ∗ ). Inverting (6.18), we
compute
!
∂M
∂I
1
∂Y
dG − ∂C
dT
∂r
∂r
∂T
=
.
∂C
∂r
dM s
D − ∂M
1 − ∂Y
∂Y
If we increase government spending G, keeping M s and T fixed, we find
1 ∂M
dY
=
dG
D ∂r
and
dr
1 ∂M
=−
,
dG
D ∂Y
so both Y and r increase. Notice that using Cramer’s rule on system (6.18) (keeping M s and T
fixed), we obtain
!
1 ∂I
1
1 ∂M
dY
∂r
=
det
=
∂M
dG
D
D ∂r
0 ∂r
!
∂C
1 − ∂Y
1
dr
1
1 ∂M
=
det
,
=−
∂M
dG
D
D ∂Y
0
∂Y
validating the results we obtained using total differentiation.
Example 6.2.5. Consider the follow problem
Minimize x + 2y + 4z
subject to x2 + y 2 + z 2 = 21
95
(6.19)
A. W. Richter
6.2. IMPLICIT FUNCTION THEOREM
The Lagrangian function is
L(x, y, z, µ) = −(x + 2y + 4z) − µ(21 − x2 − y 2 − z 2 )
and the associated first order conditions are
∂L
set
= −1 + 2µx = 0
∂x
∂L
set
= −2 + 2µy = 0
∂y
∂L
set
= −4 + 2µz = 0
∂z
∂L
set
= −21 + x2 + y 2 + z 2 = 0
∂µ
The first three equations can be solved to obtain x = 1/2µ, y = 1/µ, and z = 2/µ. Substituting
these results into the fourth equation yields µ2 = 1/4. Thus, there are two potential solutions:
(1, 2, 4, 1/2) and (−1, −2, −4, −1/2). The associated bordered Hessian is

 

Lxx Lxy Lxz Lxµ
2µ 0
0 2x
Lyx Lyy Lyz Lyµ   0 2µ 0 2y 

 

 Lzx Lzy Lzz Lzµ  =  0
0 2µ 2z 
Lµx Lµy Lµz Lµµ
2x 2y 2z 0.
Evaluated at (1, 2, 4, 1/2), ∆3 = −8µ(y 2 + z 2 ) = −80 and ∆4 = −16µ2 (x2 + y 2 + z 2 ) =
−84. Since (−1)2 ∆3 < 0, this potential solution fails the second order condition. Evaluated at
(−1, −2, −4, −1/2), ∆3 = 80 and ∆4 = −84. This passes the second order test since (−1)2 ∆3 =
80 > 0 and (−1)3 ∆4 = 84 > 0. Thus, (x∗ , y ∗ , z ∗ ) = (−1, −2, −4) is the global maximizer of f
with value −21 (not 21).
If we slightly alter the constraint, so that the problem (6.19) becomes
Minimize x + 2y + 4z
subject to x2 + y 2 + z 2 = 21 + w,
how does the optimal choice of x change as w changes from w = 0? The endogenous variables are
x, y, z, and µ. The exogenous variable is w. The first order conditions remain the same except for
the constraint (∂L/∂µ), which becomes
∂L
= −21 − w + x2 + y 2 + z 2 = 0.
∂µ
When w = 0, we already know (−1, −2, −4, −1/2) is the global maximizer (minimizer for the
original problem), since it satisfies the first order conditions. Moreover, since the bordered Hessian
at this point,


−1 0
0 −2
 0 −1 0 −4
,
H̄ ≡ 
0
0 −1 −8
−2 −4 −8 0
remains the same and its determinant: ∆4 = −84 6= 0. Thus, we can apply Theorem 6.2.3 to obtain
 ∂2L 
  

 dx 
0
−1/42
∂w∂x
dw
2
 ∂ L 
  −1/21
 dy 

−1  0 


 dw  = −H̄ −1 
 ∂w∂y
2 L  = −H̄
∂
 0  = −2/21 .
 dz 
 ∂w∂z 
dw
dµ
∂2L
−1
1/84
dw
∂w∂µ
96
A. W. Richter
6.2. IMPLICIT FUNCTION THEOREM
Even though we are not interested in dµ, µ was one of the endogenous variables in the first order
conditions, so it must be included in this stage, making H̄ a 4 × 4 matrix. Also, note that almost
all terms can be just read off from the problem where w = 0. The only new terms are in the final
vector above.
Alternatively we could apply Cramer’s rule to obtain
0
0
0 −2
0 −1 0 −4
0
0 −1 −8
−1 −4 −8 0 1
dx
−2
=−
= −
=− .
−1 0
dw
−84
42
0 −2
0 −1 0 −4
0
0 −1 −8
−2 −4 −8 0 Thus, a one unit increase in w requires a reduction in x by 1/42 units.
97
Chapter 7
Introduction to Complex Numbers
7.1 Basic Operations
7.1.1 Sums and Products
It is customary to denote a complex number (x, y) by z so that
z = (x, y).
The real numbers x and y are, moreover, known as the real and imaginary parts of z, respectively;
and we write
Re z = x
and
Im z = y.
Two complex numbers z1 = (x1 , y1 ) and z2 = (x2 , y2 ) are equal whenever they have the same real
and imaginary parts. That is, when x1 = x2 and y1 = y2 . The sum z1 + z2 and product z1 z2 of two
complex numbers is defined in the following manner:
(x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , y1 + y2 )
(x1 , y1 )(x2 , y2 ) = (x1 x2 − y1 y2 , y1 x2 + x1 y2 ).
Note that the operations defined in the above equations become the usual operations of addition and
multiplication when restricted to the real numbers:
(x1 , 0) + (x2 , 0) = (x1 + x2 , 0)
(x1 , 0)(x2 , 0) = (x1 x2 , 0).
Thus, the complex number system is a natural extension of the real number system. Any complex
number z = (x, y) can be written z = (x, 0) + (0, y) and it is easy to see that (0, 1)(y, 0) = (0, y).
Hence,
z = (x, 0) + (0, 1)(y, 0)
and, if we think of a real number as either x or (x, 0) and let i denote the pure imaginary number
(0, 1), it is clear that
z = x + iy.
Also, we find that
i2 = (0, 1)(0, 1) = (−1, 0)
98
or
i2 = −1.
A. W. Richter
7.1. BASIC OPERATIONS
7.1.2 Moduli
The modulus or absolute value of a complex number, z, is defined as the nonnegative real number
p
x2 + y 2 and is denoted by |z|; that is,
p
|z| = x2 + y 2 .
Geometrically, the number |z| is the distance between the point (x, y) and the origin, or the length
or the vector representing z. It reduces to the absolute value in the real number system when y = 0.
Note that while the inequality z1 < z2 is meaningless unless both z1 and z2 are real, the statement
|z1 | < |z2 | means that the point z1 is closer to the origin than the point z2 .
7.1.3 Complex Conjugates
The complex conjugate, or simply the conjugate, of a complex number z = x + iy is defined as the
complex number x − iy and is denoted z; that is
z = x − iy.
The number z is represented by the point (x, −y), which is the reflection about the real axis of the
point (x, y) representing z. Note the following properties of conjugates and moduli
z=z
and
|z| = |z|.
If z1 = x1 + iy1 and z2 = x2 + iy2 , then
z1 + z2 = (x1 + x2 ) − i(y1 + y2 ) = (x1 − iy1 ) + (x2 − iy2 ) = z1 + z2 .
Thus, the conjugate of the sum is the sum of the conjugates. Similar conclusions can be drawn for
differences, products, and quotients. That is
z1 − z2 = z1 − z2
z1 z2 = z1 z2
z1
z1
(z2 6= 0).
=
z2
z2
The sum z + z of a complex number z = x + iy and its conjugate z = x − iy is the real number
2x, and the difference z − z is the pure imaginary number 2iy. Hence
Re z =
z+z
2
and
Im z =
z−z
.
2i
An important identity relating the conjugate of a complex number z = x + iy to its modulus is
zz = |z|2 ,
where each side is equal to x2 + y 2 . Using the above property, we can establish the fact that the
modulus of the product is the product of the modulus. That is
|z1 z2 | = |z1 ||z2 |,
which can be seen by noting that
|z1 z2 |2 = (z1 z2 )(z1 z2 ) = (z1 z1 )(z2 z2 ) = |z1 |2 |z2 |2 = (|z1 ||z2 |)2
and recalling that the modulus is never negative.
99
A. W. Richter
7.2. EXPONENTIAL FORM
Example 7.1.1. Reduce each of the following quantities to a real number
(a)
1+2i
3−4i
+
2−i
5i
Solution: Multiply each term by the conjugate of its denominator
10i − 5 10i + 5
2
(1 + 2i) (3 + 4i) (2 − i) (−5i)
+
=
−
=− .
(3 − 4i) (3 + 4i)
(5i) (−5i)
25
25
5
(b)
5i
(1−i)(2−i)(3−i)
Solution: Multiply by the conjugate of each of the terms in the denominator
(1 + i)(2 + i)(3 + i)
25(i − 1)(i + 1)
1
5i
=
=− .
(1 − i)(2 − i)(3 − i) (1 + i)(2 + i)(3 + i)
100
2
(c) (1 − i)4
Solution: Note that (1 − i)2 = −2i. Thus,
(1 − i)4 = (−2i)2 = −4.
7.2 Exponential Form
Notice that any complex number, z, can be written as
z ≡ α + iβ =
p
α2 + β 2
α
p
α2 + β 2
Thus, redefining z in terms of polar coordinates, we obtain
+ ip
β
α2 + β 2
z = α + iβ = r [cos(θ) + i sin(θ)] ,
where cos(θ) =
α
r
and sin(θ) =
β
r
!
.
(7.1)
as illustrated in figure 7.1. In complex analysis, the real number
Figure 7.1: Complex Number in Polar Form
imaginary
( % )*
* ! +), '
*
! "#$ % &$
'
( ! -.+ '
(
real
r is not allowed to be negative and is the length of the radius vector for z; that is r = |z|. The
100
A. W. Richter
7.2. EXPONENTIAL FORM
real number θ represents the angle, measured in radians, that z makes with the positive real axis.
As in calculus, θ has an infinite number of possible values, including negative ones, that differ by
multiples of 2π. Those values can be determined from the equation tan(θ) = y/x, where the
quadrant containing the point corresponding to z must be specified. Each value of θ is called an
argument of z, and the set of all such values is denoted by arg z. The principle value of arg z,
denoted by Arg z, is that unique value Θ such that −π < Θ ≤ π. Note that
arg z = Arg z + 2πn
(n = 0, ±1, ±2, . . .)
Example 7.2.1. The complex number z = −1 − i, which lies in the third quadrant, has principle
argument −3π/4. That is,
3π
.
4
It must be emphasized that, because of the restriction −π < Θ ≤ π of the principle argument Θ, it
is not true that Arg(−1 − i) = 5π/4.
Arg(−1 − i) = −
The symbol eiθ , or exp(iθ), is defined by means of Euler’s formula as
eiθ = cos θ + i sin θ,
where θ is to be measured in radians.
Proof. In order to derive this result, first recall the Maclaurin expansions of the functions ex , cos x,
and sin x given by:
x2 x3
+
+ ··· ,
2!
3!
x2 x4 x6
cos x = 1 −
+
−
+ ··· ,
2!
4!
6!
x3 x5 x7
+
−
+ ··· .
sin x = x −
3!
5!
7!
For a complex number z, define each of the functions by the above series, replacing the real variable
x with the complex variable z. We then find that
ex = 1 + x +
(iz)2 (iz)3
(iz)4
(iz)5
(iz)6 (iz)7
(iz)8
+
+
+
+
+
+
+ ···
2!
3!
4!
5!
6!
7!
8!
z 2 iz 3 z 4 iz 5 z 6 iz 7 z 8
−
+
+
−
−
+
+ ···
= 1 + iz −
2!
3!
4!
5! 6! 7!
8!
z2 z4 z6 z8
z3 z5 z7
= 1−
+
−
+
− ··· + i z −
+
−
+ ···
2!
4!
6!
8!
3!
5!
7!
= cos z + i sin z.
eiz = 1 + iz +
This result enables us to write the polar form (7.1) more compactly in exponential form as
z = reiθ
Example 7.2.2. The complex number z = −1 − i in the previous example has exponential form
√
3π
−1 − i = 2 exp i −
.
4
This expression is, of course, only one of an infinite number of possibilities for the exponential for
of z.
101
A. W. Richter
7.3. COMPLEX EIGENVALUES
It is geometrically obvious that
eiπ = −1
e−iπ/2 = −i
e−i4π = 1
Note too that the equation
z = Reiθ
is a parametric representation of the circle |z| = R centered at the origin with radius R.
Another nice aspect of complex numbers written in polar form is that their powers are easily
computed. For example
z 2 = (α + iβ)2 = r 2 [cos(θ) + i sin(θ)]2
= r 2 cos2 (θ) + 2 cos(θ)i sin(θ) − sin2 (θ)
= r 2 [cos(2θ) + i sin(2θ)] ,
using the double angle formulas which state that cos(2a) = cos2 (a) − sin2 (a) and sin(2a) =
2 sin(a) cos(a). Continuing in this manner, we obtain the following result.
Definition 7.2.1 (DeMoivre’s formula). For complex number z = α + iβ with polar representation
r [cos(θ) + i sin(θ)] and any positive integer n,
z n = (α + iβ)n = r n [cos(nθ) + i sin(nθ)] .
(7.2)
This result can alternatively be derived in a much simpler manner by noting that (eiθ )n = einθ .
√
Example 7.2.3. In order to put ( 3 + i)7 is rectangular form, one need only write
√
√
( 3 + i)7 = (2eiπ/6 )7 = 27 ei7π/6 = (26 eiπ )(2eiπ/6 ) = −64( 3 + i).
7.3 Complex Eigenvalues
Complex eigenvalues of real matrices occur in complex conjugate pairs. That is, if z = α + iβ is a
root of the characteristic polynomial, so is z̄ = α − iβ. To see this, consider a general 2 × 2 matrix
A given by
a b
A=
.
c d
Then the eigenvalues of the matrix A can be found by solving the following equation
a−λ
b
set
det(A − λI) = det
= (a − λ) (d − λ) − cb = 0.
c
d−λ
The associated roots are then given by
λ=
or, after simplification,
a + d 1p
±
(a + d)2 − 4(ad − bc)
2
2
λ=
a + d 1p
±
(a − d)2 + 4bc.
2
2
102
A. W. Richter
7.3. COMPLEX EIGENVALUES
If the discriminant (the expression under the square root) is negative we will have complex roots.
That is, if (a − d)2 + 4bc < 0, then the roots are complex (conjugate) pairs of the form
a+d 1 p
+ i |(a − d)2 + 4bc|
2
2
a+d 1 p
= λ1 =
− i |(a − d)2 + 4bc|.
2
2
λ1 =
λ2
The fact that complex eigenvalues come in complex conjugate pairs and the fact that λ 6= λ, imply
that complex eigenvalues of a 2 × 2 system are always distinct. It is only with 4 × 4 matrices that the
possibility of repeated complex eigenvalues arises. We next show that complex eigenvectors also
come in conjugate pairs.
Suppose the general form an eigenvalue of matrix A is given by λ = α + iβ. Then the corresponding eigenvector, w, is a non-zero solution to
[A − (α + iβ) I] w = 0,
or reformulated
Aw = (α + iβ) w.
(7.3)
Now write the complex vector w in its general form: w = u + iv, where u and v are real vectors.
Then (7.3) becomes
A (u + iv) = (α + iβ) (u + iv) .
(7.4)
Applying the conjugate to both sides and recalling that for any two complex numbers z1 and z2 it
holds that z1 z2 = z 1 z 2 , we obtain
A (u + iv) = (α + iβ) (u + iv)
→ A (u − iv) = (α − iβ) (u − iv)
→ Aw = (α − iβ) w.
(7.5)
Then from (7.4) and (7.5) we see that complex eigenvectors also come in conjugate pairs. More
specifically, if u + iv is an eigenvector for α + iβ, then u − iv is an eigenvector for α − iβ. The
following theorem summarizes the discussion thus far.
Theorem 7.3.1. Let A be a k × k matrix with real entries. If λ = α + iβ is an eigenvalue of A, so
is its conjugate λ = α − iβ. If (u + iv) is an eigenvector for eigenvalue λ = α + iβ, then (u − iv)
is an eigenvector for eigenvalue λ̄ = α − iβ. If k is an odd number, then A must have at least one
real eigenvalue.
Example 7.3.1. Consider the 2 × 2 matrix
A=
1 1
−9 1
.
Its characteristic polynomial is given by
p(λ) = λ2 − 2λ + 10,
and its corresponding roots are
λ1,2 = 1 ±
1√
1p
4 − 4(10) = 1 ±
−36 = 1 ± 3i.
2
2
103
A. W. Richter
7.3. COMPLEX EIGENVALUES
We now have two complex eigenvalues that are complex conjugates. Now use λ1 = 1 + 3i to
calculate the first eigenvector for matrix A.
−3i 1
0
w1
[A − (1 + 3i)I] w =
=
.
−9 −3i
w2
0
Row-reducing the coefficient matrix in the above equation yields
−3iw1 + w2 = 0.
Normalizing w1 to 1, we get w2 = 3i. Thus, the first eigenvector is
1
1
0
w1 =
=
+i
.
3i
0
3
Since we have already seen that eigenvectors or complex eigenvalues come in conjugate pairs, it
must be the case that the second eigenvector is given by
1
0
w2 =
−i
.
0
3
Note that the second eigenvector could have also been found by using the second eigenvalue λ2 =
1 − 3i.
104
Chapter 8
Linear Difference Equations and Lag
Operators
8.1 Lag Operators
The backshift or lag operator is defined by
Lxt = xt−1
Ln xt = xt−n
for
n = . . . , −2, −1, 0, 1, 2, . . . .
Multiplying a variable xt by Ln thus gives the value of x shifted back n periods. Notice that if
n < 0, the effect of multiplying xt by Ln is to shift x forward in time by n periods.
Consider polynomials in the lag operator given by
A(L) = a0 + a1 L + a2 L2 + · · · =
∞
X
aj Lj ,
j=0
where the aj ’s are constant and L0 ≡ 1. Operating on xt with A(L) yields a moving sum of x′ s:
A(L)xt = (a0 + a1 L + a2 L2 + · · · )xt
= a0 xt + a1 xt−1 + a2 xt−2 + · · ·
∞
X
aj xt−j .
=
j=0
To take a simple example of a rational polynomial in L, consider
A(L) =
1
= 1 + λL + λ2 L2 + · · · ,
1 − λL
(8.1)
which holds assuming the normal expansion for an infinite geometric series applies in this setting.
Note that this can be verified by multiplying both sides of the equality by (1 − λL). However, this
result is only of practical use when |λ| < 1. To see why, consider the following infinite sum
∞
X
1
λj xt−j ,
xt =
1 − λL
j=0
105
A. W. Richter
8.2. FIRST-ORDER DIFFERENCE EQUATIONS
for the case in which the path of xt is constant at x. Then,
∞
X
1
xt = x
λj .
1 − λL
j=0
and if |λ| > 1, we get that the above sum is unbounded. In some instances we need a solution for
the case in which x is constant and we are looking infinitely far back in time. Thus, we impose
the condition: |λ| < 1. It is quite useful to realize that we can use an alternative expansion for the
geometric polynomial 1/(1 − λL). Formally, if the normal expansion for this infinite geometric
series applies, then
(−λL)−1
1
1
1
1
=
+
=
−
+
·
·
·
1
+
1 − λL
1 − (λL)−1
λL
λL (λL)2
2
3
1
1
1
L−2 −
L−3 − · · · .
(8.2)
= − L−1 −
λ
λ
λ
Thus, operating on xt gives
1
1
xt = − xt+1 −
1 − λL
λ
2
∞ j
X
1
1
xt+2 − · · · = −
xt+j ,
λ
λ
j=1
which shows [1/(1 − λL)]xt to be a geometrically declining weighted sum of future values of x.
Notice that
sum to be finite for a constant time path xt+j = x for all j and t, the
P for this infinite
j must be convergent, which requires that |1/λ| < 1 or, equivalently, |λ| > 1.
series − ∞
(1/λ)
j=1
8.2 First-Order Difference Equations
A linear difference equation can be defined as an equation that relates the endogenous (determined
within the model) variable yt to its previous values linearly. The simplest one is a first-order scalar
linear difference equation such as
yt = λyt−1 + bxt + a,
(8.3)
where xt is an exogenous (determined outside the model) variable and a is a constant. This is a firstorder difference equation, since yt is dependent on only its first lag yt−1 . Here, we are interested in
finding a solution of yt in terms of current, past, or future values of the exogenous variable xt and
(less importantly) the constant a. Put it different, we want to characterize the endogenous sequence
yt in terms of the exogenous sequence xt .
Using the lag operator defined above, we can rewrite (8.3) as follows:
(1 − λL)yt = bxt + a.
(8.4)
Operating on both sides of this equation by (1 − λL)−1 , we can obtain a particular solution for
(8.4), denoted by ŷt , as follows:
ŷt =
bxt
a
+
.
1 − λL 1 − λ
(8.5)
Note that since a is a constant, a/(1 − λL) = a/(1 − λ) irrespective of the size of |λ| (This can
be verified by considering the expansion given in (8.1) when |λ| < 1 and the expansion given in
106
A. W. Richter
8.2. FIRST-ORDER DIFFERENCE EQUATIONS
(8.2) when |λ| > 1). In order to obtain the general solution, we need to add a term to (8.5). For this
purpose, suppose that ỹ = ŷ + wt is also a solution to (8.4). Then, using the particular solution to
(8.4), we obtain
(1 − λL)ỹ = (1 − λL)ŷ + wt − λwt−1
= bxt + a + wt − λwt−1 .
Therefore, as long as wt = λwt−1 , ỹt is also a solution. Note that we can iterate on this condition
to obtain
wt = λwt−1 = λ2 wt−2 = λ3 wt−3 = · · · = λt w0 ,
where w0 ≡ c, an arbitrary initial value. Hence, the general solution to (8.3) is given by
bxt
a
+
+ λt c
1 − λL 1 − λ
∞
X
a
λj xt−j +
+ λt c,
=b
1−λ
yt =
(8.6)
j=0
where c is an arbitrary constant. Notice that for yt , defined by (8.6), to be finite λj xt−j must be
small for large j. That is, we require
lim
n→∞
∞
X
λj xt−j = 0
for all t.
j=n
For the case ofP
xt−j = x for all j and t, the above condition requires |λ| < 1. Notice also that the
j
infinite sum a ∞
i=0 λ in (8.6) is also finite only if |λ| < 1, in which case it equals a/(1 − λ) for
a 6= 0 and 0 otherwise. Tentatively, assume that |λ| < 1.
In order to analyze (8.6), rewrite the equation for t ≥ 1 as
yt = a
t−1
X
λj + a
∞
X
λj + b
a(1 − λt )
aλt
=
+
+b
1−λ
1−λ
=
=
=
λt )
a(1 −
1−λ
+b
a(1 − λt )
+b
1−λ
t−1
X
j=0
t−1
X
λj xt−j + b
t−1
X
∞
X
λj xt−j + λt c
j=t
j=0
j=t
j=0
t−1
X
j
λ xt−j + bλ
j=0
t
∞
X
λj x0−j + λt c
j=0

λj xt−j + λt 
a
+b
1−λ
λj xt−j + λt y0
∞
X
j=0

λj x0−j + λ0 c
(using (8.6))
j=0
t−1
X
a
a
+ λt y 0 −
λj xt−j ,
+b
1−λ
1−λ
j=0
t ≥ 1.
Consider the special case in which xt = 0 for all t. Under this condition, we obtain
a
a
+ λt y0 −
.
yt =
1−λ
1−λ
107
(8.7)
A. W. Richter
8.2. FIRST-ORDER DIFFERENCE EQUATIONS
Notice that if the initial condition y0 = a/(1 − λ), then yt = y0 . In the case, y is constant across
all future time periods and a/(1 − λ) is known as a stationary point. Moreover, it is easy to see that
for |λ| < 1, the second term in (8.7) tends to zero and thus
a
.
1−λ
lim yt =
t→∞
This shows that the system is stable, tending to approach the stationary value as time passes.
The difference equation (8.4) can also be solved using the alternative representation of (1 −
λL)−1 given in (8.2). Using this this result, the general solution is given by
(−λL)−1
(−λL)−1
a
+
bxt + λt c
1 − (λL)−1
1 − (λL)−1
∞
X
a
=
λ−j xt+j + λt c.
−b
1−λ
yt =
(8.8)
j=1
The equivalence of the solutions (8.6) and (8.8) will hold whenever
b
xt
1 − λL
and
(λL)−1
bxt
1 − (λL)−1
are both finite. However, it is often the case that one of these two conditions fails to hold. For
example, if the sequence {xt } is bounded, this is sufficient to imply that {[b/(1 − λL)]xt } is a
bounded sequence if |λ| < 1, but not sufficient to imply that
(λL)−1
bxt
1 − (λL)−1
is a convergent sum for all t. Similarly, if |λ| > 1, boundedness of the sequence {xt } is sufficient
to imply that
(λL)−1
bxt
1 − (λL)−1
is a bounded sequence, but fails to guarantee finiteness of b/(1 − λL)xt . In instances where one of
b
xt
1 − λL
or
(λL)−1
bxt
1 − (λL)−1
is always finite and the other is not, we shall take our solution to the first-order difference equation
(8.4) as either (8.6), where the backward sum in xt is finite, or (8.8), where the forward sum in xt is
finite. This procedure assures us that we shall find the unique solution of (8.4) that is finite for all t,
provided that such a solution exists.
If we want to guarantee that the sequence {yt } given by (8.6) or (8.8) is bounded for all t, it is
evident that we must set c = 0. This is necessary since if λ > 1 and c > 0,
lim cλt = ∞,
t→∞
while if λ < 1 and c > 0,
lim cλt = ∞.
t→−∞
108
A. W. Richter
8.3. SECOND-ORDER DIFFERENCE EQUATIONS
Thus, when |λ| < 1, the bounded sequence yt from (8.6) can be obtained by following the backward
representation with the initial condition c = 0 and is given by
yt = b 1 + λL + (λL)2 + (λL)3 + · · · xt +
=b
∞
X
λj xt−j +
j=0
a
.
1−λ
a
1−λ
On the other hand, when |λ| > 1, we need to use forward representation in order to get the bounded
sequence yt as follows:
(−λL)−1
(−λL)−1
bx
+
a
t
1 − (λL)−1
1 − (λL)−1
∞
X
a
λ−j xt+j +
= −b
,
1−λ
yt =
j=1
again setting c = 0. In general, the convention is to solve stable roots (|λ| < 1) backward and
unstable roots (|λ| > 1) forward.
8.3 Second-Order Difference Equations
A second-order difference equation relates the endogenous variable yt to its previous two values,
yt−1 and yt−2 , linearly. Consider following second-order difference equation given by
yt = φ1 yt−1 + φ2 yt−2 + bxt + a,
(8.9)
where xt is again an exogenous sequence of real numbers for t = . . . , −1, 0, 1, . . .. Using the lag
operator, we can write (8.9) as follows:
(1 − φ1 L − φ2 L2 )yt = bxt + a.
It is convenient to write the polynomial 1 − φ1 L − φ2 L2 in an alternative way, given by the factorization
1 − φ1 L − φ2 L2 = (1 − λ1 L)(1 − λ2 L)
= 1 − (λ1 + λ2 )L + λ1 λ2 L2 ,
where λ1 λ2 = −φ2 and λ1 + λ2 = φ1 . To see how λ1 and λ2 are related to the roots or zeros of the
polynomial A(z) = 1 − φ1 z − φ2 z 2 , notice that
1
1
−z
−z .
(1 − λ1 z)(1 − λ2 z) = λ1 λ2
λ1
λ2
Note that we use a function of a number z (possibly complex) instead of the lag operator L since
it does not really make much to talk about roots or zeros of a polynomial that is a function of a lag
operator. If we set the above equation to zero in order to solve for its roots, it is clear that the equation
is satisfied at the two roots z1 = 1/λ1 and z2 = 1/λ2 . Given the polynomial A(z) = 1−φ1 z−φ2 z 2 ,
the roots 1/λ1 and 1/λ2 are found by solving the characteristic equation
1 − φ1 z − φ2 z 2 = 0
109
A. W. Richter
8.3. SECOND-ORDER DIFFERENCE EQUATIONS
for two values of z. Given that λi = zi−1 for i = 1, 2, multiplying the above equation by z −2 yields
z −2 − z −1 φ1 − φ2 = λ2 − φ1 λ − φ2 = 0.
Applying the quadratic formula then gives
λi =
φ1 ±
p
φ21 + 4φ2
,
2
which enables us to obtain the reciprocals of λ1 and λ2 for given values of φ1 and φ2 .
8.3.1 Distinct Real Eigenvalues
General Solution
When λ1 6= λ2 and λi 6= 1 for all i, the second order difference equation (8.9) can be written as
(1 − λ1 L)(1 − λ2 L)yt = bxt + a.
(8.10)
Thus, the general solution to (8.10) is given by
yt =
a
1
bxt +
+ λt1 c1 + λt2 c2 ,
(1 − λ1 L)(1 − λ2 L)
(1 − λ1 )(1 − λ2 )
(8.11)
where c1 and c2 are any constants that can be verified by noticing
(1 − λ1 L)(1 − λ2 L)c1 λt1 = 0
(1 − λ1 L)(1 − λ2 L)c2 λt2 = 0.
and
Particular Solution
If both eigenvalues are distinct as in (8.11), then the above coefficient can be written as
1
λ1
1
λ2
=
−
.
(1 − λ1 L)(1 − λ2 L)
λ1 − λ2 1 − λ1 L 1 − λ2 L
(8.12)
Thus, if either a = 0 or the magnitude of both eigenvalues is strictly less than unity (that is, if
|λ1 | < 1 and |λ2 | < 1), (8.11) can be written as
1
λ1
λ2
a
+
−
yt =
bxt + λt1 c1 + λt2 c2
(1 − λ1 )(1 − λ2 ) λ1 − λ2 1 − λ1 L 1 − λ2 L
∞
∞
∞
∞
X
X
λ1 b X j
λ2 b X j
j
j
λ2 +
λ1
λ1 xt−j −
λ2 xt−j + λt1 c1 + λt2 c2
(8.13)
=a
λ1 − λ2
λ1 − λ2
j=0
j=0
j=0
j=0
provided that
lim
n→∞
∞
X
λji xt−j = 0,
for all t
j=n
for i = 1, 2. Note that this stipulation is needed so that the corresponding geometric sums are finite.
In order to analyze (8.13), let’s first consider the special case where a = 0, so that this equation
holds regardless of the magnitude of λi . Then rewriting (8.13) for t ≥ 1 gives
yt =
t−1
t−1
∞
λ2 b X j
λ1 b X j
λ1 b X j
λ1 xt−j −
λ2 xt−j +
λ1 xt−j
λ1 − λ2
λ1 − λ2
λ1 − λ2
j=0
j=0
110
j=t
A. W. Richter
8.3. SECOND-ORDER DIFFERENCE EQUATIONS
∞
λ2 b X j
−
λ2 xt−j + λt1 c1 + λt2 c2
λ1 − λ2
j=t
t−1
t−1
λ2 b X j
λ1 b X j
λ1 xt−j −
λ2 xt−j + λt1 θ0 + λt2 η0 ,
λ1 − λ2
λ1 − λ2
=
j=0
j=0
where
θ0 ≡



c1 +
λ1 b
λ1 − λ2
∞
X
j=0
λj1 x0−j


and

η0 ≡
Thus, for the case in which xt = 0 for t ≥ 1, we obtain



c2 −
λ2 b
λ1 − λ2
∞
X
λj2 x0−j
j=0
yt = λt1 θ0 + λt2 η0 .


.

(8.14)
If θ0 = η0 = 0, then yt = 0 for all t ≥ 1, regardless of the values of λ1 and λ2 . Thus, y = 0
is the stationary point or long-run equilibrium value of (8.14). If λ1 and λ2 are real then limt→∞
will equal its stationary point if and only if both |λ1 | < 1 and |λ2 | < 1. If one or both or the λ’s
exceed one in absolute value, the behavior of y will eventually be dominated by the term in (8.14)
associated with the λ that is larger in absolute value.
Now let’s return to the more general case. If we are interested in a bounded sequence {yt }
mapped from a bounded sequence {xt }, then we need to set both of the constants c1 and c2 to zero,
and focus on the associated particular solution. If the magnitudes of both eigenvalues are strictly
less than unity, that is, if |λ1 | < 1 and |λ2 | < 1, then the bounded solution to (8.9) is given by
yt =
∞
∞
a
λ2 b X j
λ1 b X j
λ1 xt−j −
λ2 xt−j .
+
(1 − λ1 )(1 − λ2 ) λ1 − λ2
λ1 − λ2
j=0
j=0
If, without loss of generality, |λ1 | < 1 and |λ2 | > 1, then we can write
λ1
1
L−1
1
=
+
(1 − λ1 L)(1 − λ2 L)
λ1 − λ2 1 − λ1 L 1 − (λ2 L)−1
∞
∞
λ1 X
λ2 X
=
(λ1 L)j +
(λ2 L)−j .
λ1 − λ2
λ1 − λ2
j=0
j=1
Thus, in this case, the bounded solution to (8.9) is given by
yt =
∞
∞
a
λ2 b X −j
λ1 b X j
λ1 xt−j +
λ2 xt+j .
+
(1 − λ1 )(1 − λ2 ) λ1 − λ2
λ1 − λ2
j=0
j=1
Finally, if |λ1 | > 1 and |λ2 | > 1, then we can write
1
−L−1
L−1
1
=
+
(1 − λ1 L)(1 − λ2 L)
λ1 − λ2 1 − (λ1 L)−1 1 − (λ2 L)−1
∞
∞
λ1 X
λ2 X
=−
(λ1 L)−j +
(λ2 L)−j .
λ1 − λ2
λ1 − λ2
j=1
j=1
Therefore, in this final case, the bounded solution to (8.9) is given by
yt =
∞
∞
λ2 b X −j
λ1 b X −j
a
λ1 xt+j +
λ2 xt+j .
−
(1 − λ1 )(1 − λ2 ) λ1 − λ2
λ1 − λ2
j=1
111
j=1
A. W. Richter
8.3. SECOND-ORDER DIFFERENCE EQUATIONS
8.3.2 Complex Eigenvalues
Recall that if one eigenvalue turns out to be a complex number, then the other eigenvalue is its
complex conjugate. That is, if λ1 = α + iβ, then λ2 = α − iβ, where
p
λ1 + λ2 = 2α = φ1 , λ1 λ2 = α2 + β 2 = −φ2 , |λi | = α2 + β 2 ∀ i.
Moreover, using the useful polar representation defined in section 7.2, we have
λ1 = reiw = r(cos w + i sin w),
where r =
λ2 = re−iw = r(cos w − i sin w),
p
α2 + β 2 and tan w = β/α. Finally notice that
λ1 + λ2 = r(eiw + e−iw ) = 2r cos w
λ1 − λ2 = r(eiw − e−iw ) = 2ri sin w.
and
Returning to the special case where a = 0 and xt = 0, when the eigenvalues are complex, (8.14),
becomes
yt = θ0 (reiw )t + η0 (re−iw )t
= θ0 (r t eiwt ) + η0 (r t e−iwt )
= θ0 r t [cos wt + i sin wt] + η0 r t [cos wt − i sin wt]
= (θ0 + η0 )r t cos wt + i(θ0 − η0 )r t sin wt.
Since yt must be a real number for all t, it follows that θ0 + η0 must be real and θ0 − η0 must be
imaginary. Therefore, θ0 and η0 must be complex conjugates, say θ0 = peiθ and η0 = pe−iθ . Thus,
we can write
yt = peiθ r t eiwt + pe−iθ r t e−iwt = pr t [ei(wt+θ) + e−i(wt+θ) ]
= pr t [cos(wt + θ) + i sin(wt + θ)
+ cos(−(wt + θ)) + i sin(−(wt + θ))]
= 2pr t cos(wt + θ),
where we have made use of the fact that cos is an even function and sin is an odd function (i.e.
for any input x, cos(x) = cos(−x) and sin(−x) = − sin(x)). The path of yt oscillates with a
frequency determined by w. The dampening factor, r t , is determined by the amplitude, r, of the
complex roots. When r < 1, the stationary point of the difference equation, yt = 0, is approached
as t → ∞. Moreover, as long as w 6= 0, the system displays damped oscillations. If r = 1, yt
displays repeated oscillations of unchanging amplitude and the solution is periodic. If r > 1 the
path of yt displays explosive oscillations, unless the initial conditions are say, y0 = 0 and y1 = 0 so
that y starts out at the stationary point for two successive values.
Now if we once again consider the more general case where we are interested in a bounded
sequence {yt } mapped from a bounded sequence {xt }, we need to set both of the constants c1
and c2 to zero, and focus on the associated particular solution. If we note that moduli of complex
eigenvalues are same, then when |λ| < 1, we can write
∞
∞
λ2 X
λ1 X
1
(λ1 L)j −
(λ2 L)j
=
(1 − λ1 L)(1 − λ2 L)
λ1 − λ2
λ1 − λ2
=
1
λ1 − λ2
j=0
∞ X
j=0
112
j=0
λj+1
− λj+1
Lj
1
2
A. W. Richter
8.3. SECOND-ORDER DIFFERENCE EQUATIONS
∞
X
iw j+1
j
1
−iw j+1
= iw
(re
)
−
(re
)
L
re − re−iw
j=0
=
1
2ri sin w
=
∞
X
rj
j=0
∞
X
2r j+1 i sin[w(j + 1)]Lj
j=0
sin[w(j + 1)] j
L .
sin w
Thus, the bounded solution to (8.9) is given by
yt = a
∞
X
j=0
rj
∞
X sin[w(j + 1)]
sin[w(j + 1)]
rj
+b
xt−j .
sin w
sin w
j=0
If, on the other hand, |λ| > 1, we can write
∞
∞
1
λ2 X
λ1 X
−j
(λ1 L) +
(λ2 L)−j
=−
(1 − λ1 L)(1 − λ2 L)
λ1 − λ2
λ1 − λ2
j=1
=−
1
λ1 − λ2
=−
∞
X
j=1
∞ X
−j
λ−j
−
λ
L−(j+1)
1
2
j=0
r −(j+1)
j=0
sin(wj) −(j+1)
L
.
sin w
Thus, in this case, the bounded solution to (8.9) is given by
yt = −a
∞
X
r −(j+1)
j=0
∞
X
sin(wj)
sin(wj)
r −(j+1)
−b
xt+j+1 .
sin w
sin w
j=0
8.3.3 Stability Conditions for Distinct Eigenvalues
Recall that the roots of (8.9) are given by
λi =
φ1 ±
p
φ21 + 4φ2
.
2
For the roots to be complex, the discriminant must be negative, i.e.,
φ21 + 4φ2 < 0,
which implies that φ2 is negative. When the above condition is satisfied, the roots are given by
p
p
−(φ21 + 4φ2 )
−(φ21 + 4φ2 )
φ1
φ1
+i
≡ a + ib,
λ2 =
−i
≡ a − ib.
λ1 =
2
2
2
2
Once again, recall that in polar form
a + ib = r[cos w + i sin w] = reiw ,
113
A. W. Richter
8.3. SECOND-ORDER DIFFERENCE EQUATIONS
Figure 8.1: Second Order Difference Equation: Regions of Stability
where r ≡
√
a2 + b2 and tan w = β/α. Thus we have that
s φ1 2 (φ21 + 4φ2 ) p
r=
= −φ2 .
−
2
4
For the oscillations to be damped,
meaning that in the long-run the difference equation will be
√
stable, we require that r = −φ2 < 1, which requires that φ2 > −1.
If the roots are real, the difference equation will be stable if both roots are less that one in
magnitude. This requires
p
p
φ1 + φ21 + 4φ2
φ1 − φ21 + 4φ2
−1 <
< 1 and − 1 <
< 1.
2
2
Note that it is sufficient to find conditions such that statement on the left hand side is less than unity
while the condition on the right hand side is greater than minus one. The former condition requires
q
q
1
2
φ1 + φ1 + 4φ2 < 1 →
φ21 + 4φ2 < 2 − φ1
2
→ φ21 + 4φ2 < 4 + φ21 − 4φ1
The latter condition requires
q
1
2
φ1 + 4φ2 − φ1 < 1
2
→ φ21 + 4φ2 < 4 + φ21 + 4φ1
→
→
→
φ1 + φ2 < 1.
q
φ21 + 4φ2 < 2 + φ1
φ2 − φ1 < 1.
Therefore, when φ2 > −1, φ1 + φ2 < 1, and φ2 − φ1 < 1 hold, the roots, regardless of whether
they are real (φ21 + 4φ2 ≥ 0) or complex (φ21 + 4φ2 < 0), will yield a stable second order difference
equation. The following figure summarizes these results.
114
A. W. Richter
8.3. SECOND-ORDER DIFFERENCE EQUATIONS
8.3.4 Repeated Real Eigenvalues
General Solution
When λ1 = λ2 ≡ λ and λi 6= 1 for all i, the second order difference equation (8.9) becomes
(1 − λL)2 yt = bxt + a.
(8.15)
Thus, the general solution to (8.15) is given by
yt =
1
a
bxt +
+ λt c1 + tλt c2 ,
2
(1 − λL)
(1 − λL)2
(8.16)
where c1 and c2 are constants. To see this note that
(1 − λL)2 (λt c1 + tλt c2 ) = 0.
Particular Solution
If we are interested in a bounded sequence {yt } mapped from a bounded sequence {xt }, then we
need to set both of the constants c1 and c2 to zero, and focus on the associated particular solution.
When λ1 = λ2 ≡ λ and |λ| < 1, we can show that
∞
X
(λL)j+1 =
j=0
λL
.
1 − λL
Applying the derivative trick from Example 1.1.29 gives
∞
X
1
1
(j + 1)(λL)j .
=
=
2
(1 − λ1 L)(1 − λ2 L)
(1 − λL)
j=0
Thus, in this case, the bounded solution to (8.9) is given by
yt =
∞
X
a
(j + 1)λj xt−j .
+b
2
(1 − λL)
j=0
If, on the other hand, |λ| > 1, we can show that
∞
X
1
−(λL)
(λL)−(j+1) .
=
=−
−1
1 − λL
1 − (λL)
j=0
Applying the derivative trick yields
∞
X
1
(j + 1)(λL)−(j+2) .
=
2
(1 − λL)
j=0
Thus, in this case, the bounded solution to (8.9) is given by
yt =
∞
X
a
(j + 1)(λ)−(j+2) xt+j+2 .
+
b
(1 − λL)2
j=0
115
A. W. Richter
8.4. SYSTEMS OF LINEAR DIFFERENCE EQUATIONS
8.4 Systems of Linear Difference Equations
8.4.1 Solution Technique with Real Eigenvalues
Consider the following k-dimensional system of equations:
zt+1 = Azt ,
with z ∈ Rk and A ∈ Rk×k . Let λ1 , λ2 , . . . , λk be the eigenvalues of A and let v1 , v2 , . . . , vk ∈ Rk
be the corresponding eigenvectors. Then the projection matrix P = [v1 , v2 , . . . , vk ] and
AP
= [Av1 , Av2 , . . . , Avk ]
= [λ1 v1 , λ2 v2 , . . . , λk vk ]

λ1 0 · · ·
 0 λ2 · · ·

= [v1 , v2 , . . . , vk ]  .
.. . .
 ..
.
.
0 0 ···


λ1 0 · · · 0
 0 λ2 · · · 0 


= P .
.. 
.. . .
.
 .
. . 
.
0 0 · · · λk
0
0
..
.
λk





If P is invertible we can obtain:



P −1 AP = 

λ1 0 · · ·
0 λ2 · · ·
.. . .
..
.
.
.
0 0 ···
0
0
..
.
λk
with λ1 , λ2 , . . . , λk be distinct. Now we want to solve:





(8.17)
zt+1 = Azt .
We can go through the following steps: First multiply the above equation by P −1 and use equation
(8.17) to obtain
P −1 zt+1 = P −1 Azt

λ1 0 · · ·
 0 λ2 · · ·

=  .
.. . .
 ..
.
.
0 0 ···
0
0
..
.
λk


 −1
 P zt .

Then, if we define Z = P −1 z, we get the following decoupled system:


λ1 0 · · · 0
 0 λ2 · · · 0 


Zt+1 =  .
.. . .
..  Zt ,
 ..
. . 
.
0
0
116
···
λk
A. W. Richter
8.4. SYSTEMS OF LINEAR DIFFERENCE EQUATIONS
which, expanded out, can be written as:

 
Z1,t+1
 Z2,t+1  

 

=
..

 
.
Zk,t+1
λ1 0 · · ·
0 λ2 · · ·
..
.. . .
.
.
.
0 0 ···

0
0
..
.




λk
Z1,t
Z2,t
..
.
Zk,t



.

(8.18)
The above form represents k separate difference equations that can be recursively solved to obtain
Z1,t+1 = λ1 Z1,t = λ21 Z1,t−1 = · · · = λt+1
1 Z1,0
Z2,t+1 = λ2 Z2,t = λ22 Z2,t−1 = · · · = λt+1
2 Z2,0
..
.
Zk,t+1 = λk Zk,t = λ2k Zk,t−1 = · · · = λt+1
k Zk,0 ,
which, in matrix form, can be represented as

  t+1
λ1
0
···
Z1,t+1
t+1
 Z2,t+1   0
λ
···
2

 

 =  ..
..
..
..

  .
.
.
.
Zk,t+1
0
0
···
0
0
..
.
λt+1
k
Moreover, since z ≡ P Z, the above system can be written as

 t+1

λ1
0
···
0
Z1,t+1
t+1
 Z2,t+1 
 0
λ2
···
0



P
 = P  ..
..
..
..
.
.

 .

.
.
.
.
Zk,t+1

z1,t+1
 z2,t+1


..

.
zk,t+1
0






 = P


λt+1
1
0
..
.
0
0
···
0
λt+1
2
..
.
0
λt+1
k
···
···
..
.
0
0
..
.
···
λt+1
k
If we define diag(λ1 , . . . , λt ) = D, we obtain





Z1,0
Z2,0
..
.
Zk,0


 −1
P P






.






(8.19)
Z1,0
Z2,0
..
.
Zk,0

z1,0

 z2,0
 −1 
 P  ..

 .
zk,0







.

zt+1 = P D t+1 P −1 z0 .
Alternatively we can use (8.19) to express the solution as
  t+1

λ1 c1
Z1,t+1
 Z2,t+1   λt+1 c2
  2

=

..
..
 

.
.
λt+1
k ck
Zk,t+1



,

which, if we multiply both sides of the above result by the projection matrix P , implies
 t+1 
λ1 c1
 λt+1 c2 
 2

zt+1 = [v1 , v2 , ...vk ] 

..


.
λt+1
k ck
117
A. W. Richter
8.4. SYSTEMS OF LINEAR DIFFERENCE EQUATIONS
t+1
t+1
= v1 λt+1
1 c1 + v2 λ2 c2 + · · · + vk λk ck ,
where vk ∈ Rk represents the kth eigenvector and the constant ck ≡ Zk,0 = P −1 zk,0 . The
following theorem summarizes this alternative approach.
Theorem 8.4.1 (Difference Equation with Real Roots). Let A be a k × k matrix with k distinct
real eigenvalues λ1 , . . . , λk and corresponding eigenvectors v1 , . . . , vk . The general solution of
the system of difference equations is
t+1
t+1
zt+1 = v1 λt+1
1 c1 + v2 λ2 c2 + · · · + vk λk ck .
(8.20)
Remark 8.4.1. Consider a system of two linear difference equations
xt+1 = axt + byt
yt+1 = cxt + dyt
or, in matrix form,
zt+1
a b xt
xt+1
=
≡
≡ Azt .
yt+1
c d yt
If b = c = 0 in these equations, they are uncoupled:
xt+1 = axt
yt+1 = dyt
and are easily solved as two separate one-dimensional problems:
x t = at x 0
and
yt = dt y0 .
When the equations are coupled (b 6= 0 or c 6= 0), the technique for solving the system is to find
a change of variables that decouples these equations. This is precisely the role of eigenvalues and
eigenvectors.
Example 8.4.1. Consider the following coupled system of difference equations
xt+1 = xt + 4yt
yt+1 = 0.5xt
or, in matrix form,
zt+1
1 4 xt
xt+1
=
≡
≡ Azt .
yt+1
0.5 0 yt
To find the eigenvalues solve the following
set
|A − λI| = (λ − 2)(λ + 1) = 0,
which implies that λ1 = 2 and λ2 = −1 are the the eigenvalues to this system. To find the
corresponding eigenvectors, row-reduce A − 2I and A + I to obtain the following equations
xt = 4yt
2xt = −4yt .
Normalizing yt to 1, we get the following basis vectors that form the relevant projection matrix
4 −2
P =
.
1 1
118
A. W. Richter
8.4. SYSTEMS OF LINEAR DIFFERENCE EQUATIONS
Applying the change of variables Z = P −1 z to the original difference equation, we obtain
Zt+1 = (P −1 AP )Zt
t
λ1 0
λ1 0
Z
=
Z =
0 λ2 t
0 λt2 0
t
2
0
c1
=
.
0 (−1)t c2
Thus, if we reapply the change of variables, we have
xt
t 4
t −2
= c1 2
+ c2 (−1)
,
yt
1
1
which is the same equation that we would have arrived at had we applied Theorem 8.4.1.
8.4.2 Solution Technique with Complex Eigenvalues
Consider the following two-dimensional system of equations
zt+1 = Azt ,
where A is a 2 × 2 matrix with complex eigenvalues α ± iβ. Applying the change of variables
z = P Z to the above difference equation yields
P Zt+1 = AP Zt
→
Zt+1 = P −1 AP Zt .
Since A is assumed to have complex eigenvalues, it has corresponding complex eigenvectors w1 =
u + iv and w2 = u − iv. Thus, the projection matrix is given by
P = [w1 , w2 ] = [u + iv, u − iv] .
Moreover, using equation (8.17), it then follows that
λ1 0
α + iβ
0
−1
P AP =
=
.
0 λ2
0
α − iβ
Thus, the decoupled system becomes
Xt+1
α + iβ
0
Xt
Zt+1 ≡
=
.
Yt+1
0
α − iβ Yt
Recursive substitution then yields
Xt = k1 (α + iβ)t
Yt = k2 (α − iβ)t ,
where the constant K ≡ [k1 , k2 ]T = Z0 = P −1 z0 could be real or complex. Using the fact that
zt = P Zt , we can transforming transform the variables into their original form to obtain
k1 (α + iβ)t
xt
= [u + iv, u − iv]
zt =
yt
k2 (α − iβ)t
= k1 (α + iβ)t (u + iv) + k2 (α − iβ)t (u − iv) .
119
(8.21)
A. W. Richter
8.4. SYSTEMS OF LINEAR DIFFERENCE EQUATIONS
Notice that this solution takes the same form as (8.20), except with complex eigenvalues and eigenvectors replacing the real eigenvectors and eigenvalues.
Since the original problem contained only real numbers, we would like to find a solution that
contains only real numbers. Since every solution of the system is contained in equation (8.21) for
different choices of K = [k1 , k2 ]T , we want to know if we can find parameters k1 and k2 so that
equation (8.21) is real.
Notice that except for the constant factors, the first term in equation (8.21) is the complex
conjugate of the second. Since the sum of any complex number and its conjugate is the real number
2α, we want to choose the first constant, k1 , to be any complex constant c1 + ic2 and let the second
constant, k2 , be its conjugate pair, c1 − ic2 . Then the first and second term in (8.21) turn out to be
complex conjugates and the sum of them will be a real solution given by
zt = (c1 + ic2 ) (α + iβ)t (u + iv) + (c1 − ic2 ) (α − iβ)t (u − iv)
= 2 Re (c1 + ic2 ) (α + iβ)t (u + iv) .
(8.22)
Applying Demoivre’s Formula (7.2), the above result can be written as
z = 2 Re (c1 + ic2 ) r t [cos(tθ) + i sin(tθ)] (u + iv)
= 2r t Re {[(c1 cos(tθ) − c2 sin(tθ)) + i (c2 cos(tθ) + c1 sin(tθ))] (u + iv)}
= 2r t [(c1 cos(tθ) − c2 sin(tθ)) u − (c2 cos(tθ) + c1 sin(tθ)) v] ,
which is now a real solution.
Theorem 8.4.2. Let A be a 2 × 2 matrix with complex eigenvalues α∗ ± iβ ∗ and corresponding
eigenvectors u∗ ± iv∗ . Write eigenvalues in polar coordinates as r ∗ [cos(θ ∗ ) + i sin(θ ∗ )], where
∗ ∗
q
α β
2
2
∗
∗
∗
∗
∗
,
.
r = (α ) + (β ) and (cos(θ ), sin(θ )) =
r∗ r∗
Then the general solution of the difference equation zt+1 = Azt is
zt = (r ∗ )t [(c1 cos(tθ ∗ ) − c2 sin(tθ ∗ )) u∗ − (c2 cos(tθ ∗ ) + c1 sin(tθ ∗ )) v∗ ] .
Example 8.4.2. In Example 7.3.1, we found that the eigenvalues of
1 1
A=
−9 1
are 1 ± 3i with corresponding eigenvectors
1
0
±i
.
0
3
In polar coordinates, the eigenvalues become
√
√
3
1
= 10 (cos θ ∗ + i sin θ ∗ ) ,
1 + 3i = 10 √ + i √
10
10
where θ ∗ = arccos √110 ≈ 71.565◦ or 1.249 radians. The general solution for the system
zt+1
xt
xt+1
1 1
≡
=
−9 1
xt
xt+1
120
A. W. Richter
8.4. SYSTEMS OF LINEAR DIFFERENCE EQUATIONS
is given by
√
xt
1
0
= ( 10)t (c1 cos(tθ ∗ ) − c2 sin(tθ ∗ ))
− (c2 cos(tθ ∗ ) + c1 sin(tθ ∗ ))
yt
0
3
∗
∗
√
c1 cos(tθ ) − c2 sin(tθ )
= ( 10)t
.
−3c2 cos(tθ ∗ ) − 3c1 sin(tθ ∗ )
Remark 8.4.2. In higher dimensions, a given matrix can have both real and complex eigenvalues.
The solution of the corresponding system of difference equations is the obvious combination of the
solutions described in Equation 8.4.1 and Theorem 8.4.2.
121
Chapter 9
Linear Differential Equations
An ordinary differential equation (ODE) is an equation ẏ = F (y, t) between the derivative of an
unknown function y(t) and an expression F (y, t) involving y and t. If the expression F (y, t) does
not involve t, that is, if the equation can be written as ẏ = F (y), we call it an autonomous or
time-independent differential equation. If the equation specifically involves t, we call the equation
non-autonomous or time-dependent.
9.1 First-Order Differential Equations
R 2
Many innocent looking integration problems, like et dt, have no closed form solutions that can be
written down explicitly. Unfortunately, the same thing is true for more general differential equations.
Even simple-looking differential equations like ẏ = y 2 + t2 have no explicit solutions. This section
will list the most important classes that do have explicit solutions and will compute their solutions.
In general, there are four main types of ordinary differential equations that have explicit solutions.
1. Autonomous, separable ODE:
ẏ = ay
This equation is the simplest to solve and can be solved by separating the variables in the
following manner:
Z
Z
dy
dy
dy
= ay
→
= adt
→
= adt
ẏ =
dt
y
y
→ ln(y) = at + c
→ y = eat+c
Thus, the solution is given by
y(t) = keat ,
where k = ec .
2. Non-autonomous, non-separable ODE:
ẏ = ay + b
In order to solve this equation, conduct the following steps:
dy
dy
= ay + b →
− ay e−at = be−at
ẏ =
dt
dt
122
→
d −at ye
= be−at
dt
A. W. Richter
9.1. FIRST-ORDER DIFFERENTIAL EQUATIONS
Z
b
be−at = − e−at + k
a
Thus, the solution is given by
→
−at
ye
b
y(t) = − + keat ,
a
for some constant k.
3. Non-autonomous, separable ODE:
ẏ = a(t)y
In order to solve this equation, conduct
variables gives
dy
ẏ =
= a(t)y →
dt
Z
→ ln(y) = a(t)dt + c
steps similar to the first equation. Separating the
dy
= a(t)dt
y
R
→
y=e
→
Z
dy
=
y
Z
a(t)dt
a(t)dt+c
Thus, the solution is given by
R
y = ke
a(t)dt
where k = ec .
4. Non-autonomous, non-separable ODE:
ẏ = a(t)y + b(t)
In order to solve this equation, conduct steps similar to the second equation. Using the integrating factor gives
R
R
dy
= a(t)y + b(t) → [ẏ − a(t)y] e− a(t)dt = b(t)e− a(t)dt
ẏ =
dt R
Z
d ye− a(t)dt
R
R
R
→
= b(t)e− a(t)dt → ye− a(t)dt = b(t)e− a(t)dt dt + c
dt
Thus, the solution is given by
R
Z
R
− a(t)dt
b(t)e
dt + k e a(t)dt
y=
Remark 9.1.1. Note the useful formula for differentiating an integral
Z
d t
f (x) dx = f (t) ,
dt a
where a is constant. Then
Rt
d ye− 0 a(s)ds
Rt
Rt
= ẏe− 0 a(s)ds + ye− 0 a(s)ds (−a (t)) ,
dt
Rt
= [ẏ − a (t) y] e− 0 a(s)ds .
The extension of the above formula, which is known as Leibniz integral rule, gives a formula for
differentiation of a definite integral whose limits are functions of the differential variable,
!
Z v(x)
Z b(x)
∂
d
′
′
(f (x, t)) dt.
f (x, t) dt = v (x) f (x, b (x)) − u (x) f (x, a (x)) +
dx
∂x
u(x)
a(x)
123
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
9.2 Solving Systems of Differential Equations
9.2.1 Preliminaries
Let I be an open interval that contains a fixed value a. If f (t) is n × 1 and A(t) is n × n, and both
f (t) and A(t) are continuous on I (i.e., each entry is continuous), then the first-order linear system
of differential equations
x′ (t) = A(t)x(t) + f (t), x(a) = x0
(9.1)
has exactly one solution for the n × 1 vector function x(t) on I for each choice of x0 . The general
solution of
x′ (t) = A(t)x(t) + f (t)
on I is x(t) = xc (t) + xp (t), where xc is the general solution of the homogeneous system x′ (t) =
A(t)x(t) and xp is one particular solution of the nonhomogeneous system x′ (t) = A(t)x(t) + f (t).
The complementary solution xc can be written as
xc (t) = c1 x1 (t) + · · · + cn xn (t)
where x1 , . . . , xn are linearly independent solutions of the homogeneous system on I and c1 , . . . , cn
are constants.
Example 9.2.1. The first order system
x′1 = 4x1 − 3x2
x′2 = 6x1 − 7x2
can be written as the single matrix equation
dx
4 −3
x = Ax
=
6 −7
dt
T
where x1 x2 . To verify that the vector functions
−5t 2t e
3e
and x2 (t) =
x1 (t) =
3e−5t
2e2t
are both solutions of the matrix differential equation with coefficient matrix A, we need only calculate
2t 6e
4 −3 3e2t
= x′1
=
Ax1 =
4e2t
6 −7 2e2t
and
Ax2 =
4 −3
6 −7
e−5t
5e−5t
=
= x′2
3e−5t
−15e−5t
Moreover, the linear combination
x(t) = c1 x1 (t) + c2 x2 (t) = c1
−5t 3e2t
e
+
c
2
2e2t
3e−5t
is also a solution. Note that this can be verified in the same manner as the individual solutions
presented above.
124
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
Remark 9.2.1. Solutions x1 (t), . . . , xn (t) of the homogeneous
system are
linearly independent
on I iff the Wronskian defined by W (x1 , . . . , xn ) := det x1 · · · xn is not zero for some
t ∈ I.
Example 9.2.2. It is readily verified, as in the previous example, that
 5t 
 3t 
 t
2e
2e
2e
x1 (t) = 2et  , x2 (t) =  0  , and x3 (t) = −2e5t 
e5t
−e3t
et
are the solutions of the equation


3 −2 0
dx 
= −1 3 −2 x
dt
0 −1 3
T
where x1 x2 x3 . The Wronskian of these solutions is
t
2 2
2e 2e3t
2 2e5t t
0
−2e5t = e9t 2 0 −2 = −16e9t
W = 2e
1 −1 1 et −e3t
e5t which is never zero. Hence the solutions x1 , x2 and x3 are linearly independent (on any open
interval).
Example 9.2.3. The nonhomogeneous linear system
x′1 = 3x1 − 2x2 − 9t + 13
x′2 = −x1 + 3x2 − 2x3 + 7t − 15
x′3 = −x2 + 3x3 − 6t + 7
can be represented in matrix form as




3 −2 0
−9t + 13
dx 
= −1 3 −2 x +  7t − 15  = Ax + f (t)
dt
0 −1 3
−6t + 7
T
where x1 x2 x3 . In Example 9.2.2 we saw that a general solution of the associated homogeneous linear system


3 −2 0
dx 
= −1 3 −2 x
dt
0 −1 3
is given by

2c1 et + 2c2 e3t + 2c3 e5t

2c1 et − 2c3 e5t
xc (t) = 
t
3t
5t
c1 e − c2 e + c3 e

125
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
and we can verify, by substitution, that the function
 
3t
xp (t) =  5 
2t
is a particular solution of the original nonhomogeneous system. Consequently, the general solution
of the nonhomogeneous system is given by
x(t) = xc (t) + xp (t)
9.2.2 Homogeneous Systems
Definition 9.2.1. A matrix X(t) = x1 (t) · · · xn (t) of linearly independent solutions of the
homogeneous linear system x′ (t) = A(t)x(t) is a fundamental matrix for the homogeneous linear
system.
If X(t) = x1 (t) · · · xn (t) is any fundamental matrix for the homogeneous linear system,
then the general solution can be written as
x(t) = c1 x1 (t) + · · · + cn xn (t) = X(t)c
(9.2)
T
where c = c1 · · · cn is constant. If x(a) is a given initial condition, then x(a) = X(a)c
implies X(a)−1 x(a) = c. Therefore,
x(t) = X(t)X(a)−1 x(a)
(9.3)
and we may use the row reduction X(a) x(a) ∼ In c to find c once X(a) and x(a) are
known.
Remark 9.2.2.
1. Since W (x1 , . . . , xn ) = det X(t) 6= 0 for each t in I, we know that X(a)−1 always exists.
2. There is no significant loss of generality if we assume a = 0 in (9.1) since we can always
make a change of variables. In what follows, we will assume a = 0 in all initial value
problems.
9.2.3 Constant Coefficient Homogeneous Systems
Let A be a constant n × n matrix. We solve the constant coefficient homogeneous system
x′ (t) = Ax(t).
(9.4)
The Diagonalizable Case
When A is diagonalizable, there exist n linearly independent (possibly complex) eigenvectors
v1 , . . . , vn and corresponding (possibly complex) eigenvalues λ1 , . . . , λn . (The eigenvalues need
not be distinct.) Let xk (t) = eλk t vk for k = 1, . . . , n. Then each xk is a solution since
x′k (t) = λk eλk t vk
= eλk t (λk vk )
126
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
= eλk t Avk
= A(eλk t vk )
= Axk (t).
The solutions xk are linearly independent because
W (x1 , . . . , xn ) = det eλ1 t v1 · · · eλ1 t vn
= eλ1 t · · · eλn t det v1 · · · vn
is never zero on I since eλk t is never zero for each k and the vectors v1 , . . . , vn are linearly independent. Therefore, the general solution of (9.4) is
x(t) = c1 eλ1 t v1 + · · · + cn eλn t vn = X(t)c
(9.5)
where X(t) = eλ1 t v1 · · · eλn t vn is a fundamental matrix for the system that satisfies X(0) =
v1 · · · vn . Defining x(0) = x0 , (9.5) yields c = X(0)−1 x0 . Therefore,
x(t) = X(t)X(0)−1 x0
(9.6)
Example 9.2.4. Find a general solution of the system
x′ = 4x + 2y
y ′ = 3x − y
Then find a fundamental matrix for the system, and use it to solution of the above system that
satisfies the initial conditions x(0) = 1 and y(0) = −1.
Solution: The matrix form of the above system is
4 2
′
x =
x
3 −1
T
where x1 x2 . The characteristic equation of the coefficient matrix is
4 − λ
2 = (4 − λ)(−1 − λ) − 6
3
−1 − λ
set
= λ2 − 3λ − 10 = (λ + 2)(λ − 5) = 0
Thus, we have distinct real eigenvalues λ1 = −2 and λ2 = 5. Substitution of the first eigenvalue
λ1 = 2 into the eigenvector equation (A − λI)v = 0 yields
0
6 2 v11
=
0
3 1 v21
Thus, using the first equation in the above system gives
6v11 + 2v12 = 0
T
Normalizing v11 then implies that the corresponding eigenvector is v1 = 1 −3 . Similarly,
T
v2 = 2 1 . Thus, these two eigenvalues and their associated eigenvectors yield the two solutions
2 5t
1
−2t
e
e
and x2 (t) =
x1 (t) =
1
−3
127
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
They are linearly independent, because their Wronskian
−2t
e
2e5t 3t
−3e−2t e5t = 7e
is nonzero. Hence a general solution of the original system is
1
2 5t
x(t) = c1 x1 (t) + c2 x2 (t) = c1
e−2t + c2
e
−3
1
The linearly independent solutions found above yield the fundamental matrix
−2t
e
2e5t
X(t) =
−3e−2t e5t
Then
X(0) =
1 2
−3 1
and
X(0)−1 =
1 1 −2
7 3 1
Hence the formula in (9.6) gives the solution
−2t
1
e
2e5t
1 −2
1
x(t) =
−3e−2t e5t
3 1
−1
7
1 3e−2t 4e5t
=
7 −3e−2t 2e5t
Remark 9.2.3. An advantage of the fundamental matrix approach is that once we know the fundamental matrix X(t) and the inverse matrix X(0)−1 , we can calculate rapidly, by matrix multiplication, the solutions corresponding to the different initial conditions.
Remark 9.2.4. Another way to think about what just occurred is to begin
with x′ = Ax and then
−1
make the change of variables w = P x, where P = v1 · · · vn is an invertible matrix made
up of n linearly independent eigenvectors that correspond to the eigenvalues λ1 , . . . , λn . Then
AP = A v1 · · · vn
= Av1 · · · Avn
= λ1 v1 · · · λn vn


 

0
λ1


. 


 . 

0
v1 · · · vn  .  
=  v1 · · · vn  .  · · ·
 0 
 .. 

0
λn


λ1 0 · · · 0

.. 
 0 λ2 · · ·
. 

= v1 · · · vn 

 ..
..
 . ···
. 0 
0 ···
0 λn


λ1 0 · · · 0

.. 
 0 λ2 · · ·
. 


=P .

 .. · · · . . . 0 
0 ···
0 λn
128
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
= PD
where D is the diagonal matrix with the eigenvalues λ1 , . . . , λn on the diagonal. Since P is invertible, we have A = P DP −1 and D = P −1 AP (i.e., A is similar to D). Since A = P DP −1 and
P w = x, the system x′ = Ax can be rewritten as
d x = Ax
dt
i
dh
P w = P DP −1 x
dt
d P
w = P Dw
dt
and thus by multiplying both sides on the left by P −1 , we obtain the “decoupled” system w′ = Dw.
This is a simpler version of the problem since D is simpler than A. Solving the decoupled system,
we have
 ′
  

w1
w1
λ1 w1
 
  

w′ =  ...  = D  ...  =  ... 
wn′
wn
λn wn
and so wk′ = λk wk for k = 1, . . . , n. Hence wk = wk (0)eλk t for each k and
 λt

  λt

e 1 w1 (0)
w1 (0)
e 1
O

  .. 
 
..
..
w=
  .  = eDt w(0) = eDt P −1 x(0)
=
.
.
eλn t wn (0)
eλn t
O
wn (0)
where eDt is defined as the diagonal matrix with diagonal entries eλk t . Now that w is found, we
can find x via x = P w. Therefore,
x = P w = P eDt w(0) = P eDt P −1 x(0)
and so we have solved for x once we find the eigenvalues and
n linearly independent eigenvectors
of A. The matrix X(t) = P eDt = eλ1 t v1 · · · eλn t vn is a fundamental matrix that satisfies
X(0) = P I = P . (The matrix X(t) = P eDt P −1 is also a fundamental matrix, but in this case
X(0) = P IP −1 = I.)
Example 9.2.5. Consider the system
ẋ = 2x + y
ẏ = −12x − 5y
We can write this in vector form as
x
2
1
ẋ
′
= Az(t)
=
z (t) ≡
y
−12 −5
ẏ
The characteristic polynomial is given by
p(λ) = (2 − λ)(−5 − λ) + 12 = (λ + 1)(λ + 2)
129
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
Thus, corresponding roots are λ1 = −1 and λ2 = −2. Now use λ1 to calculate the the first
eigenvector for matrix A.
0
3
1
v11
=
[A − λI]v =
0
−12 −4
v12
Using the first row in the above equation yields
3v11 + v12 = 0
T
T
Normalizing v11 implies that v1 = 1 −3 . Similarly, we can obtain v2 = 1 −4 . Hence,
the general family of solutions can be written as
x(t)
1
1
−t
−2t
= c1 e
+ c2 e
y(t)
−3
−4
This system needs two initial conditions to pin down particular solutions. If we assume that x(0) =
1 and y(0) = 0, then we get
1 = c1 + c2
0 = −3c1 − 4c2
It follows that c1 = 4 and c2 = −3. The particular solutions are then given by
x(t) = 4e−t − 3e−2t
y(t) = −12e−t + 12e−2t
Equivalently we could have computed
x = P eDt P −1 x(0)
−1 −t
1
1
1
e
0
1
1
=
0
0 e−2t −3 −4
−3 −4
−t
4e − 3e−2t
=
12e−2t − 12e−t
The Non-diagonalizable Case
Suppose X(t) = x1 (t) · · · xn (t) is a fundamental matrix for the homogeneous system (9.4).
Then the columns of X(t) are linearly independent and
d x1 (t) · · · xn (t)
dt
= x′1 (t) · · · x′n (t)
= Ax1 (t) · · · Axn (t)
= A x1 (t) · · · xn (t)
X ′ (t) =
= AX(t).
That is, fundamental matrices corresponding to (9.4) satisfy the matrix differential equation
= AX(t). Conversely, any matrix X(t) with linearly independent columns that satisfies X ′ (t) = AX(t) must be a fundamental matrix for the system. By analogy with the scalar
X ′ (t)
130
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
differential equation y ′ (t) = αy(t), which has solution y(t) = y(0)eαt , we might guess1 that
X(t) = eAt X(0) solves X ′ (t) = AX(t). Of course, to make sense out of this formula, we need to
decide on what it means to raise e to a matrix power.
Definition 9.2.2. If M is an n × n matrix, the exponential matrix eM is defined by
eM = I + M +
∞
X 1
1 2
1
M + M3 + · · · =
M k.
2!
3!
k!
k=0
Some properties are:
eO = I
αI
e
(O is a square zero matrix)
α
=e I
(α is any scalar)


d1 · · · 0


D =  ... . . . ...  diagonal
0 · · · dn
=⇒


ed1 · · ·
0

..  is diagonal
eD =  ... . . .
.
d
n
0 ··· e
M and N commute (i.e., M N = N M ) =⇒ eM +N = eM eN = eN eM
−1
= e−M
(eM is always invertible)
eM
d At
e = AeAt (for a brief explanation, see the note2 below)
dt
d At
e = AeAt , we know that each column of eAt is a solution of the homogeneous
Because dt
system. Furthermore, the columns of eAt are linearly independent since eAt is always invertible,
and so it follows that eAt is a fundamental matrix for the homogeneous system. From (9.2), it also
follows that the general solution of the homogeneous system is
x(t) = eAt c = eAt x(0).
Remark 9.2.5. If X(t) is any fundamental matrix for the homogeneous system, it can be shown
that
X(t) = eAt X(0)
and so
eAt = X(t)X(0)−1 .
(9.7)
d
The order of the factors in X(t) = eAt X(0) is significant. As we shall soon see, dt
[eAt ] = AeAt and so
At
At
At
At
d
d
X (t) = dt [e X(0)] = Ae X(0) = AX(t). However, dt [X(0)e ] = X(0)Ae 6= AX(0)eAt because X(0)
need not commute with other matrices.
2
By differentiating term by term and then re-indexing the series, we get
1
′
∞
∞
∞
∞
X
X
d At X k k k−1 X
1
1
1 i i
e =
A t
=
Ak tk−1 = A
Ak−1 tk−1 = A
A t = AeAt .
dt
k!
(k
−
1)!
(k
−
1)!
i!
i=0
k=1
k=1
k=1
131
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
Therefore, the general solution x = X(t)c = X(t)X(0)−1 x(0) of (9.4) may be determined
providing we can reasonably compute eAt .
Everything mentioned so far in this section also applies to the
diagonalizable case. Let’s briefly
return to that case in order to illustrate the theory. If X(t) = eλ1 t v1 · · · eλn t vn is a fundamental matrix, then
eAt vk = etλk I+(A−λk I)t vk
= etλk I e(A−λk I)t vk
(tλk I and (A − λk I)t commute)
2
t
(eλk tI = eλk t I)
= eλk t I I + t(A − λk I) + (A − λk I)2 + · · · vk
2!
t2
λk t
2
= e I vk + t(A − λk I)vk + (A − λk I) vk + · · ·
2!
= eλk t [vk + 0 + · · · ]
= eλk t vk
(series truncates since (A − λk I)p vk = 0 for each p ≥ 1)
for each k = 1, . . . , n. It is then easy to see that
X(t) = eλ1 t v1 · · · eλn t vn = eAt v1 · · ·
eAt vn = eAt v1 · · ·
vn .
Returning to the non-diagonalizable case, we wish to generalize what occurred in the diagonalizable case
independent vectors u1 , . . . , un so that a fundamental matrix is
by finding linearly
At
X(t) = e
u1 · · · un , where the vectors u1 , . . . , un are chosen so that the infinite series for
eAt uk truncates for each k = 1, . . . , n. To that end, suppose λ is an eigenvalue of A and there is
some r ≥ 1 so that (A − λI)r u = 0. Then for any p ≥ r we have
(A − λI)p u = (A − λI)p−r (A − λI)r u = (A − λI)p−r 0 = 0
and so the series for eAt u truncates. Specifically,
tr−1
t2
(A − λI)r−1 u
eAt u = eλt u + t(A − λI)u + (A − λI)2 u + · · · +
2!
(r − 1)!
2
r−1
t
t
λt
2
r−1
= e I + t(A − λI) + (A − λI) + · · · +
(A − λI)
u.
2!
(r − 1)!
(9.8)
Definition 9.2.3. Let A be an n × n matrix. If λ is an eigenvalue of A, then u is a generalized
eigenvector of A associated with λ if there is some positive integer r such that
(A − λI)r u = 0 and
(A − λI)r−1 u 6= 0.
In this case we say that u has rank r.
Remark 9.2.6.
1. If u is a generalized eigenvector, then u 6= 0.
2. Eigenvectors are rank 1 generalized eigenvectors.
3. If λ has multiplicity m (i.e., the factor that corresponds to λ in the characteristic polynomial
has power m) and u is a rank r generalized eigenvector associated with λ, then 1 ≤ r ≤ m.
132
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
4. If λ has multiplicity m and u is a generalized eigenvector associated with λ, the rank of u is
the smallest power r with 1 ≤ r ≤ m such that (A − λI)r u = 0.
To summarize, if we can find linearly independent generalized eigenvectors u1 , . . . , un , then
the general solution of the homogeneous system (9.4) is given by
x(t) = X(t)c = eAt u1 · · · un c
= eAt u1 · · · eAt un c
= c1 eAt u1 + · · · + cn eAt un .
(9.9)
Remark 9.2.7. Note how (9.9) resembles (9.5). In fact, if D is the diagonal matrix of eigenvalues
when A is diagonalizable, equation (9.5) can be written in matrix form as x(t) = eAt P c. Thus, the
only real difference between the diagonalizable and non-diagonalizable cases is that the matrix P
of independent eigenvectors is replaced by a matrix of independent generalized eigenvectors.
By a theorem from linear algebra, the existence of linearly independent generalized eigenvectors
u1 , . . . , un is guaranteed. They are made up of ordinary eigenvectors associated with eigenvalues
whose eigenspace dimensions equal their multiplicities together with generalized eigenvectors with
ranks greater than 1 that are associated with eigenvalues whose eigenspace dimensions are less than
their multiplicities.
Definition 9.2.4. If λ is an eigenvalue of a n × n matrix A, then λ is deficient when the number of
linearly independent associated eigenvectors is less than the multiplicity of λ.
Putting It All Together
Suppose the n × n homogeneous linear system (9.4) has p distinct eigenvalues λ1 , λ2 , . . . ,λp with
corresponding multiplicities m1 , m2 , . . . , mp . Note that p ≤ n, each mk ≥ 0, and m1 + m2 + · · · +
mp = n. For each non-deficient eigenvalue λi we determine mi linearly independent eigenvectors
(i.e., rank 1 generalized eigenvectors) by finding mi linearly independent solutions of (A−λi I)u =
0. For each deficient eigenvalue λj we determine mj linearly independent generalized eigenvectors
with rank rj > 1 by finding mj linearly independent solutions of (A − λj I)rj u = 0. This means
we want to find the smallest rj with 1 < rj ≤ mj such that the nullspace of (A − λI)rj is mj dimensional. Equivalently, we want to find the smallest power rj with 1 < rj ≤ mj so that the
reduced row-echelon matrix of (A − λI)rj has mj columns that don’t contain leading 1s (we want
mj free variables). Then we may create mj linearly independent solutions that correspond to the
eigenvalue λj by defining
xj1 (t) = eAt uj1
..
.
xjmj (t) = eAt ujmj
where uj1 , . . . , ujmj are rank rj linearly independent generalized eigenvectors of A associated with
λj . (That is, the vectors uj1 , . . . , ujmj are linearly independent solutions of (A − λI)rj u = 0.) Then
a fundamental matrix for the homogeneous system (9.4) is
X(t) = eAt u11 · · · u1m1 u21 · · · u2m2 · · · up1 · · · upmp
and the general solution of the homogeneous system is therefore
x(t) = X(t)c.
133
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
Remark 9.2.8. Suppose λ is a deficient eigenvalue with multiplicity m such that there are at
most q linearly independent associated eigenvectors with 1 ≤< m. It might seem reasonable
to begin with whatever independent eigenvectors you have, say v1 , . . . , vq , and then add m − q
independent
generalized eigenvectors u1 , . . . , um−q to get a potential general solution, given by
eAt v1 · · · vq u1 · · · um−q . Unfortunately, it turns out that the set of eigenvectors and
generalized eigenvector, {v1 , . . . , vq , u1 , . . . , um−q }, is not necessarily linearly independent even
though both {v1 , . . . , vq } and {u1 , . . . , um−q } are. To illustrate, if


0
1
2
x′ (t) = Ax(t) =  −5 −3 −7  x(t)
1
0
0
then λ = −1 is the only eigenvalue (it has multiplicity 3). Since
  

 

1
1
2
1 0 1
 −1 
A + I =  −5 −2 −7  ∼  0 1 1  =⇒ E−1 = Span  −1 


1
0
1
0 0 0
1
T
we know that e−t v1 is a solution, where v1 = −1 −1 1 . We next attempt to find additional
solutions eAt u1 and eAt u2 , where u1 , u2 are rank r > 1 generalized eigenvectors. Since r ≤ 3, it
must be that r = 2 and so we get
 

 

 
−1
−3
−2 −1 −3
1 12 23
2







=⇒ u1 =
2 , u2 =
0 .
(A + I) = −2 −1 −3 ∼ 0 0 0
2
0 0 0
0
2
1
3
However, {v1 , u1 , u2 } is not independent since u2 − u1 = 2v1 . It is not possible to find two rank 2
generalized eigenvectors to combine with v1 to get three linearly independent vectors. The correct
procedure is to find three linearly independent generalized rank 3 eigenvectors.
Key Example

5 20 30
We want to determine eAt if A =  0 10 20  by using the formula eAt = X(t)X(0)−1 (cf.
0 0 5
equation (9.7)). Specifically, we shall perform the following steps:

1. Since A is 3 × 3, we find the eigenvalues λ1 , λ2 , λ3 and corresponding linearly independent
(possibly generalized) eigenvectors u1 , u2 , u3 .
2. Calculate eAt uk for k = 1, 2, 3. (The series for eAt uk truncates for each k. See equation
(9.8).)
3. Set X(t) = eAt u1 eAt u2 eAt u3 .
4. Since X(t) is a fundamental matrix of the system x′ (t) = Ax(t), we can write eAt =
−1
X(t)X(0)−1 = eAt u1 eAt u2 eAt u3 u1 u2 u3
.
First of all, the eigenvalues of A are λ = 5, 10 since A is triangular. The eigenvalue λ = 10 has
multiplicity 1 and λ = 5 has multiplicity 2.
λ = 10: The multiplicity of λ = 10 is 1, and so we need to find one eigenvector.
134
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS

 
−5 20 30
1 −4
A − 10I =  0 0 20  ∼  0
0
0 0 −5
0
0

0
1
0
=⇒
 
4
u1 =  1 
0
λ = 5: Since the multiplicity of λ = 5 is 2, we need to find two (possibly generalized) independent eigenvectors.

 

There are fewer than two free variables. We don’t
0 1 0
0 20 30
have the required two independent eigenvectors,
A − 5I =  0 5 20  ∼  0 0 1 
so we keep looking.
0 0 0
0 0 0

 
0 1 4 We now have two free variables, so there are the
0 100 400
(A − 5I)2 =  0 25 100  ∼  0 0 0  required two independent (rank 2) generalized
0 0 0 eigenvectors.
0
0
0

 
 
0
1
Based on the last step, we have generalized eigenvectors u2 =  0  and u3 =  −4 . If
0
1
At u
′ (t) = Ax(t) and you can check that
we let X(t)
=
e
u
u
,
then
each
column
solves
x
1
2
3
X(0) = u1 u2 u3 is invertible. Therefore, X(t) is a fundamental matrix for the system. Since
the series for eAt uk truncates for each k, we get
eAt u1 = e10t [I]u1
 
4
10t  
=e
1
0
(u1 has rank 1)
eAt u2 = e5t [I + t(A − 5I)] u2
  



1
0 20 30
1 0 0
= e5t  0 1 0  + t  0 5 20   0 
0
0 0 0
0 0 1

 
1
20t
30t
1
= e5t  0 1 + 5t 20t   0 
0
0
1
0
 
1
= e5t  0 
0
(u2 has rank 2)
eAt u3 = e5t [I + t(A − 5I)] u3

 
1
20t
30t
0
= e5t  0 1 + 5t 20t   −4 
0
0
1
1


−50t
5t 
=e
−4 
1
(u3 has rank 2)
135
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
and so

 
 


4
1
−50t
X(t) =  e10t  1  e5t  0  e5t  −4  
0
0
1

 10t 5t
5t
4e
e
−50te
10t

0
−4e5t  .
= e
0
0
e5t
We are now in a position to determine eAt . Because X(t) = eAt X(0), we obtain
eAt = X(t)X(0)−1
−1
4
1
0
= X(t)  1
0 −4 
0
0
1


0
1
4
= X(t)  1 −4 −16 
0
0
1

 10t 5t

0
1
4
4e
e
−50te5t
0
−4e5t   1 −4 −16 
=  e10t
0
0
e5t
0
0
1

 5t
10t
5t
10t
5t
e
4e − 4e
16e − e (16 + 50t)
.
= 0
e10t
4e10t − 4e5t
5t
0
0
e

Remark 9.2.9. The general solution of the constant coefficient homogeneous linear system x′ (t) =
T
Ax(t) is then given by x(t) = eAt x(0). For instance, if x(0) = 15 −1 2 , then you can
check that


28e10t − e5t (13 + 100t)
.
x(t) = eAt x(0) = 
7e10t − 8e5t
5t
2e
Note that once eAt is known, then we can easily solve the IVP x′ (t) = Ax(t), x(0) = x0 for any
x0 by simply multiplying x0 by eAt , i.e., x(t) = eAt x0 .
9.2.4 Constant Coefficient Homogeneous Systems Revisited
Now suppose that A = QJQ−1 for some Q (i.e., A is similar to J) such that J is an “almost”
diagonal matrix in the sense that J = D + N , D is a diagonal matrix of the eigenvalues λ1 , . . . , λn
of A, N r is the zero matrix for some r ≥ 1, and DN = N D. Proceeding along the lines of the
diagonalizable case, we obtain
eAt = QeJt Q−1
= QeDt+N t Q−1
= QeDt eN t Q−1
tr−1
t2 2
r−1
Dt
N
Q−1
= Qe
I + tN + N + · · · +
2!
(r − 1)!
136
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS

eλ1 t
0 ···
 0 eλ2 t · · ·

= Q .
..
 ..
.
···
0
···
0
0
0
..
.
eλn t
Remark 9.2.10.


t2 2
tr−1

r−1
N
Q−1 .
 I + tN + N + · · · +

2!
(r − 1)!
(9.10)
1. A theorem in linear algebra guarantees that it is always possible to write A in the manner
described above. The matrix J is not unique.
2. A suitable matrix Q can be formed by using linearly independent generalized eigenvectors.
(The matrix Q is not unique, just as the matrix P wasn’t unique in the diagonalizable case.)
3. The matrix N is referred to as a nilpotent matrix. Any square matrix N such that N p is the
zero matrix for some p ≥ 1 is said to be nilpotent.
4. Solving for N in A = Q(D + N )Q−1 , we get N = Q−1 AQ − D.
5. A is diagonalizable iff N is the zero matrix, in which case eN t = I and Q = P .
Key Example Revisited


5 20 30
We want to determine eAt if A =  0 10 20  by using (9.10). Specifically, we shall perform the
0 0 5
following steps:
1. Since A is 3 × 3, we find the eigenvalues λ1 , λ2 , λ3 and correspondinglinearly independent
(possibly generalized) eigenvectors u1 , u2 , u3 . Let Q = u1 u2 u3 .


λ1 0 0
2. Form N = Q−1 AQ − D, where D =  0 λ2 0 . Verify that N D = DN and find the
0 0 λ3
smallest positive integer r such that N r = O.
3. Then
eAt = QeDt eN t Q−1
= u1 u2


λ1 t
e
0
0
u3  0 eλ2 t
0 
λ
0
0 e 3t
−1
tr−1
t2
.
N r−1 u1 u2 u3
I + tN + N 2 + · · · +
2!
(r − 1)!
In section 9.2.3 of this document, we found
A. Using those vectors, let

4
Q = u1 u2 u3 =  1
0
linearly independent generalized eigenvectors for

1
0
0 −4 
0
1
137
A. W. Richter
9.2. SOLVING SYSTEMS OF DIFFERENTIAL EQUATIONS
to get




0
1
4
10 0 0
N = Q−1 AQ − D =  1 −4 −16  AQ −  0 5 0 
0
0
1
0 0 5


0 0
0
=  0 0 −50  .
0 0
0
You can check that N D = DN and that N 2 is the zero matrix. Hence eN t = I + tN , i.e., the
series for eN t truncates. Therefore,
eAt = QeDt (I + tN ) Q−1
  10t




4
1
0
e
0
0
1 0
0
0
1
4
0 −4   0 e5t 0   0 1 −50t   1 −4 −16 
= 1
0
0
1
0
0 e5t
0 0
1
0
0
1

 5t
e
4e10t − 4e5t 16e10t − e5t (16 + 50t)

= 0
e10t
4e10t − 4e5t
5t
0
0
e
which is the same answer we previously obtained.


0
0
5 −50 . If the generalized eigenvectors
0
5


200 50
0
that make up the matrix Q are chosen with a bit of extra care, say Q =  50 0
4 , then
0 0 −1




10 0 0
0 0 0
J =  0 5 1  = D + N where N =  0 0 1 . Note that this new matrix Q differs only
0 0 0
0 0 5
from the old matrix by multiplying the columns by some nonzero scalars (200, 50, and −1). This
new, slightly simpler, J we’ve found is called the Jordan normal form of A.
10
Remark 9.2.11. In the preceding example, J =  0
0
Remark 9.2.12. The steps in above example work out more simply in problems where Q can be
chosen to be the identity matrix. (This amounts to saying that any n linearly independent vectors
are, in fact, n linearly independent generalized eigenvectors.) This occurs when A can be written as
A = D+N , where D is some diagonal matrix and N is some nilpotent matrix such that N D = DN
(D and N commute).
Remark 9.2.13. In the preceding example, it is possible to write

 

 
0 20 30
5 0 0
5 20 30
A =  0 10 20  =  0 10 0  +  0 0 20  =: D + N
0 0 0
0 0 5
0 0 5
where D is diagonal and N is nilpotent with N 3 = O. However, D and N don’t commute and so
we can’t say eDt+N t = eDt eN t .
138
Chapter 10
Introduction to Discrete Time Models
Consider the following problem
V (k0 ) = max
c,k
subject to
T
X
β t U (ct )
t=0
ct + kt+1 ≤ f (kt )
k0 given
where u(·) is a strictly increasing, strictly concave, and twice differentiable utility function. ct
denotes consumption at time t, kt+1 denotes the level of capital carried into the next period, and
f (·) is an increasing function of current capital.
10.1 Deriving Euler Equations
We want to differentiate
ct
T
}|
{
z
X
β t U (f (kt ) − kt+1 )
V (k0 ) = max
kt+1
t=0
with respect to kt+1 , which means essentially differentiating
. . . β t U (f (kt ) − kt+1 ) + β t+1 U (f (kt+1 ) − kt+2 ) . . .
with respect to kt+1 . Thus, we have the following first order condition
∂
set
: −β t U ′ (ct ) + β t+1 U ′ (ct+1 )f ′ (kt+1 ) = 0,
∂kt+1
which, after simplification, becomes
U ′ (ct ) = βU ′ (ct+1 )f ′ (kt+1 ).
(10.1)
This equation is known as an Euler equation. It relates the marginal utility of consumption at time
t to the discounted marginal utility benefit of postponing consumption for one period. Alternatively,
it states that marginal cost of forgoing consumption today must equal to discounted marginal benefit
of investing.
139
A. W. Richter
10.1. DERIVING EULER EQUATIONS
10.1.1 The Dynamic Program: Part I
The Bellman equation for the model introduced above is
Vt (kt ) = max {U (ct ) + βVt+1 (kt+1 )}
ct ,kt+1
(10.2)
subject to ct + kt+1 = f (kt ).
Substituting for consumption we obtain
Vt (kt ) = max {U (f (kt ) − kt+1 ) + βVt+1 (kt+1 )} .
kt+1
(10.3)
The first order condition is given by
∂
set
′
: −U ′ (ct ) + βVt+1
(kt+1 ) = 0
∂kt+1
(10.4)
To illustrate the Envelope Condition, postulate a law of motion for capital given by
kt+1 = y(kt ),
which intuitively asserts that tomorrow’s capital stock is a function of today’s stock. Next, substitute
this “optimal investment plan” into the initial problem, equation (10.2) to get
Vt (kt ) = U (f (kt ) − y(kt )) + βVt+1 (y(kt )).
Optimizing with respect to the state variable kt results in
∂
:
∂kt
′
Vt′ (kt ) = U ′ (ct )[f ′ (kt ) − y ′ (kt )] + βVt+1
(kt+1 )y ′ (kt )
F OC=0
→
Vt′ (kt )
z
}|
{
′
= U (ct )f (kt ) + [−U ′ (ct ) + βVt+1
(kt+1 )]y ′ (kt )
′
′
The expression in square brackets is zero according to the first order conditions defined in equation
(10.4). Thus, the expression simplifies to
Vt′ (kt ) = U ′ (ct )f ′ (kt ),
(10.5)
In short, the Envelope Condition allows us to differentiate (10.3) directly with respect to the state
variable kt , ignoring its effect on kt+1 to get the exact same result.
If we roll equation (10.5) forward one period
′
Vt+1
(kt+1 ) = U ′ (ct+1 )f ′ (kt+1 ),
′ (k
we can use this result to substitute for Vt+1
t+1 ) in the first order condition (10.4) to obtain
U ′ (ct ) = βU ′ (ct+1 )f ′ (kt+1 ),
(10.6)
which is the same as the Euler equation seen above.
Remark 10.1.1. Note that we could have instead substituted out the choice variable kt+1 in (10.3)
and applied similar steps to obtain the Euler equation.
140
A. W. Richter
10.1. DERIVING EULER EQUATIONS
10.1.2 The Dynamic Program: Part II
In this section, we consider an alternative way of writing the optimization problem:
V (y0 ) = max
c,x
T
X
β t U (ct )
t=0
subject to ct + xt = yt ,
yt = f (xt−1 )
y0 given
where xt denotes the level of savings at time t, yt represents output at time t, and f (·) is an increasing function of past savings. In this case, the value function is given by
Vt (yt ) = max {U (ct ) + βVt+1 (yt+1 )}
ct ,xt
subject to ct + xt = yt
yt = f (xt−1 )
Thus, after substituting the constraints into the value function, we have
(
)
xt
z }|
{ Vt (yt ) = max U (ct ) + βVt+1 f yt − ct
ct
(10.7)
The first order condition is given by
∂
set
′
: U ′ (ct ) − βVt+1
(yt+1 )f ′ (xt ) = 0.
∂ct
(10.8)
In this case, in order to derive the Envelope condition, suppose ct = g(yt ), and substitute this policy
function into the value function, (10.7). Then we have


x

z }|t { 
Vt (yt ) = max
U g(yt ) + βVt+1 f yt − g(yt )
,
ct


so that the derivative with respect to the state variable yt becomes
∂
′
: Vt′ (yt ) = U ′ (ct )g′ (yt ) + βVt+1
(yt+1 ) f ′ (xt ) 1 − g ′ (yt )
∂yt
F OC=0
hz
}|
{i
∂
′
′
→
(yt+1 ) f ′ (xt ) g′ (yt ).
(yt+1 ) f ′ (xt ) + U ′ (ct ) − βVt+1
: Vt′ (yt ) = βVt+1
∂yt
Again, we can use the first order condition (10.8) to cancel the second term so that we have the
following Envelope condition:
′
(yt+1 ) f ′ (xt ) .
Vt′ (yt ) = βVt+1
(10.9)
Just as we noted in the previous example, we could have arrived at this result much quicker by
simply differentiating expression (10.7) with respect to the state variable yt ignoring the effect it
has on ct .
141
A. W. Richter
10.2. SOLVING DYNAMIC MODELS
If we combine (10.9) and (10.8), we obtain
U ′ (ct ) = Vt′ (yt ).
Updating this result and plugging it into (10.9), we obtain the familiar Euler equation
U ′ (ct ) = βU ′ (ct+1 )f ′ (xt ).
(10.10)
Remark 10.1.2. Equations (10.1) and (10.10) are indeed identical since “savings” today is equivalent to the available capital tomorrow (xt = kt+1 ).
Remark 10.1.3. Note that we could have used xt−1 as our state variable instead of yt to once again
obtain the same Euler equation.
10.2 Solving Dynamic Models
There are numerous ways to solve dynamic models:
1. Method of undetermined coefficients (guess and verify method): Guess a functional form
for the value function and then verify that this functional form satisfies Bellman’s functional
equation.
2. Value function iteration (Bellman Iteration)
3. Euler equation iteration
4. Howard’s Improvement Algorithm: guess a functional form for the policy function and plug
it into the Euler equation.
It is important to note that some solution methods will be far easier to apply certain models. Below
is set of examples that are designed to help students develop the basic techniques of each of the
above solution methods.
10.2.1 Method of undetermined coefficients
Cake-Eating Problem
V (w0 ) = max
c,w
∞
X
β t ln(ct )
t=0
subject to ct + wt+1 = wt
w0 given
The value function is
V (wt ) = max
ct ,wt+1
{ln(ct ) + βV (wt+1 )}
subject to ct + wt+1 = wt .
After substitution of ct , we have
V (wt ) = max {ln(wt − wt+1 ) + βV (wt+1 )}
wt+1
142
(10.11)
A. W. Richter
10.2. SOLVING DYNAMIC MODELS
Then if we guess that the value function has function form: V (w) = A + B ln w the above value
function becomes:
A + B ln wt = max {ln(wt − wt+1 ) + β(A + B ln wt+1 )}
wt+1
(10.12)
The first order condition is:
∂
:
∂wt+1
−
1
βB set
= 0.
+
wt − wt+1 wt+1
Thus, after simplification, we obtain
wt+1 =
βBwt
1 + βB
Plugging the result of (10.13) into the value function (10.12), we obtain
wt
βBwt
A + B ln wt = ln
+ βA + βB ln
1 + βB
1 + βB
Equating coefficients, we find
B = 1 + βB
1
→B=
1−β
1
βB
A = ln
+ βA + βB
1 + βB
1 + βB
1
β
→A=
ln β
ln(1 − β) +
1−β
1−β
Thus, plugging the value of B into (10.13) and utilizing the budget constraint, we obtain
wt+1 = βwt
ct = (1 − β)wt ,
implying that agents’ saving is determined by their degree of patience.
Neoclassical Growth Model with log utility
V (k0 ) = max
′
c,k
∞
X
β t ln(ct )
t=0
subject to ct + kt+1 = Aktα
k0 given
The value function is
V (kt ) = max {ln(ct ) + βV (kt+1 }
ct ,kt+1
subject to ct + kt+1 = Aktα .
143
(10.13)
A. W. Richter
10.2. SOLVING DYNAMIC MODELS
After substitution of ct , we have
V (kt ) = max {ln(Aktα − kt+1 ) + βV (kt+1 )}
kt+1
(10.14)
Then if we guess that the value function has function form: V (k) = E + F ln k the above value
function becomes:
E + F ln kt = max {ln(Aktα − kt+1 ) + β(E + F ln kt+1 )}
kt+1
(10.15)
The first order condition is:
∂
1
βF set
= 0.
: −
+
α
∂kt+1
Akt − kt+1 kt+1
Thus, after simplification, we obtain
kt+1 =
Aktα βF
1 + βF
(10.16)
Plugging the result of (10.16) into the value function (10.15), we obtain
α
Akt βF
Aktα
+ βE + βF ln
E + F ln kt = ln
1 + βF
1 + βF
Equating coefficients, we find
F = α + αβF
α
→F =
1 − αβ
A
AβF
+ βE + βF
E = ln
1 − βF
1 + βF
1
αβ
→E=
ln(αβA)
ln (1 − αβ)A +
1−β
1 − αβ
Thus, plugging the value of B into (10.16) and utilizing the budget constraint, we obtain
kt+1 = αβAktα = αβyt
ct = (1 − αβ)Aktα = (1 − αβ)yt
Neoclassical Growth Model with CES utility (α = 1)
V (k0 ) = max
′
c,k
∞
X
c1−σ
βt t
1−σ
t=0
subject to ct + kt+1 = Akt .
The value function is
V (kt ) = max
ct ,kt+1
c1−σ
t
+ βV (kt+1 )
1−σ
144
(10.17)
A. W. Richter
10.2. SOLVING DYNAMIC MODELS
subject to ct + kt+1 = Aktα .
After substitution of ct , we have
V (wt ) = max
kt+1
(Akt − kt+1 )1−σ
+ βV (kt+1 )
1−σ
(10.18)
1−σ
Then if we guess that the value function has function form: V (k) = Bk
1−σ the above value function
becomes:
)
(
1−σ
βBkt+1
(Akt − kt+1 )1−σ
Bkt1−σ
(10.19)
= max
+
kt+1
1−σ
1−σ
1−σ
The first order condition is:
∂
−σ set
: (Akt − kt+1 )−σ (−1) + βBkt+1
= 0.
∂kt+1
Thus, after simplification, we obtain
kt+1 =
Akt
1 + (βB)−1/σ
Plugging the result of (10.20) into the value function (10.19), we obtain


!1−σ
1−σ
1−σ
−1/σ
Akt
1  (βB)
Akt
Bkt

+ βB
=
1−σ
1−σ
1 + (βB)−1/σ
1 + (βB)−1/σ
i1−σ
h
h
i1−σ
→ 1 + (βB)−1/σ
B = (βB)−1/σ A
+ βBA1−σ
i
h
i1−σ
h
= βA1−σ 1 + (βB)−1/σ
→ 1 + (βB)−1/σ
i−σ
h
→ 1 + (βB)−1/σ
= βA1−σ
σ−1
→ (βB)−1/σ = β −1/σ A σ − 1
i−σ
1 h −1/σ σ−1
→ B=
β
A σ −1
β
Thus, plugging the value of B into (10.20), we obtain
kt+1 =
A
β −1/σ A
σ−1
σ
kt
= (βA)1/σ kt
= β 1/σ A
and using the budget constraint
1−σ
σ
yt ,
1−σ
1
ct = 1 − β σ A σ yt
145
(10.20)
A. W. Richter
10.2. SOLVING DYNAMIC MODELS
10.2.2 Value Function Iteration
We will again consider the Neoclassical Growth Model with log utility presented above. However,
we will solve this model in a very different way.
V (k0 ) = max
′
c,k
∞
X
β t ln(ct )
t=0
subject to ct + kt+1 = Aktα .
Start with an initial guess for the value function at time 0: V0 (k) = 0. This guess acts as a starting
point, which is the same role a state variable plays. Using the budget constraint and this guess, we
have
V1 (k) = max
ln(Ak α − k′ ) + βV0 (k′ )
′
k
α
′
= max
ln(Ak
−
k
)
′
k
Since the logarithm function is increasing, the maximum for ln(Ak α − k′ ) occurs at k′ = 0. Thus,
we have
V1 (k) = ln(Ak α ) = ln A + α ln k
c = Ak α
Continuing to iterate, we obtain
V2 (k) = max
′
k
= max
′
k
The first order condition is:
After simplification, we obtain
ln(Ak α − k′ ) + βV1 (k′ )
ln(Ak α − k′ ) + β(ln A + α ln k′ )
(10.21)
1
αβ set
(−1) + ′ = 0.
Ak α − k′
k
k′ =
αβAkα
1 + αβ
and
c=
Ak α
1 + αβ
Thus, plugging the value of k′ into (10.21), we obtain
αβAk α
Ak α
+ β ln A + αβ ln
V2 (k) = ln
1 + αβ
1 + αβ
A
αβA
= ln
+ β ln A + αβ ln
+ α(1 + αβ) ln k
1 + αβ
1 + αβ
Continuing to iterate, we obtain
V3 (k) = max
ln(Ak α − k′ ) + βV2 (k′ )
′
k
A
α
′
+ β ln A
= max
ln(Ak − k ) + β ln
1 + αβ
k′
αβA
+ αβ ln
+ α(1 + αβ) ln k′
1 + αβ
146
(10.22)
A. W. Richter
10.2. SOLVING DYNAMIC MODELS
The first order condition is:
1
αβ(1 + αβ) set
= 0.
(−1) +
Akα − k′
k′
Thus, after simplification, we obtain
αβ(1 + αβ)Ak α
1 + αβ(1 + αβ)
Akα
c=
1 + αβ(1 + αβ)
k′ =
Thus, plugging the value of k′ into (10.22), we obtain
Akα
A
αβA
V3 (k) = ln
+ β β ln A + ln
+ αβ ln
+
1 + αβ(1 + αβ)
1 + αβ
1 + αβ
αβ(1 + αβ)Akα
αβ(1 + αβ) ln
1 + αβ(1 + αβ)
= constant + α 1 + αβ + (αβ)2 ln k
Continuing to iterate infinitely many times, we eventually obtain
α
′
′
V (k) = max
ln(Ak
−
k
)
+
βV
(k
)
k′
α
′
= max
ln(Ak
−
k
)
+
β
constant +
′
k
α
ln k′
1 − αβ
The first order condition is:
αβ 1 set
1
(−1) +
= 0.
′
−k
1 − αβ k′
Ak α
Thus, after simplification, we obtain
k′ = αβAk α = αβyt
c = Ak α (1 − αβ) = (1 − αβ)yt .
Notice that this is the same solution that we arrived at in system (10.17).
10.2.3 Euler Equation Iteration
Consider again the Neoclassical Growth Model with log utility that was introduced above.
V (k0 ) = max
′
c,k
∞
X
β t ln(ct )
t=0
subject to ct + kt+1 = Aktα = yt .
From equation (10.6), we get the following Euler equation
1
ct
1
α−1
Aαkt+1
ct+1
1 yt+1
.
= αβ
ct+1 kt+1
= β
147
A. W. Richter
10.2. SOLVING DYNAMIC MODELS
Using the budget constraint, it follows that
kt+1
ct+1 + kt+2
= αβ
ct
ct+1
(10.23)
Adding 1 to both sides of (10.23), we obtain
ct + kt+1
ct+1 + kt+2
= 1 + αβ
ct
ct+1
Defining zt =
ct +kt+1
ct
=
yt
ct
and iterating, we get
zt = 1 + αβzt+1
= 1 + αβ + (αβ)2 zt+2
= 1 + αβ + (αβ)2 + (αβ)3 zt+3
..
.
∞
X
(αβ)t + lim (αβ)T zt+T
=
T →∞
t=0
=
1
,
1 − αβ
provided that βα < 1 (we know it’s positive), which ensures that the infinite sum converges. Using
the definition of zt , it is easy to see that
ct = (1 − αβ)yt
kt+1 = αβyt
10.2.4 Howard’s Improvement Algorithm
Neoclassical Growth Model with CES utility
V (k0 ) = max
′
c,k
∞
X
βt
t=0
c1−σ
t
1−σ
subject to ct + kt+1 = Akt = yt
k0 given
From equation (10.6), we get the following Euler equation
c−σ
= βc−σ
t
t+1 A,
which implies
1
ct = (βA)− σ ct+1 .
Next guess that ct = θyt , where θ is some unknown constant that we must determine. Then the
Euler equation becomes
1
θyt = (βA)− σ θyt+1 ,
148
A. W. Richter
10.2. SOLVING DYNAMIC MODELS
which, after substituting in the production function, becomes
1
θAkt = (βA)− σ θAkt+1 ,
so that the period-to-period capital ratio is
1
kt
= (βA)− σ .
kt+1
(10.24)
Now using the budget constraint and replacing ct = θyt we have
θyt + kt+1 = Akt
→
→
→
θAkt + kt+1 = Akt
kt+1 = (A − θA) kt
kt+1
= (A − θA) .
kt
(10.25)
Combine (10.24) and (10.25) and get
1
(βA) σ = A − θA
1
→
→
A − (βA) σ
A
1
1−σ
θ = 1 − (β) σ A σ .
θ=
Substituting for θ in our policy guess and using the budget constraint, we obtain
1−σ
1
ct =
1 − β σ A σ yt ,
1
kt+1 = β σ A
1−σ
σ
yt ,
which is the same solution we found in (Equation 10.2.1). However, it is clear that this method is
far less tedious.
Neoclassical Growth Model with log utility
V (k0 ) = max
′
c,k
∞
X
β t ln(ct )
t=0
subject to ct + kt+1 = Aktα = yt
k0 given
We have already seen that the Euler equation is
1
ct
1
α−1
Aαkt+1
ct+1
1 yt+1
= αβ
.
ct+1 kt+1
= β
which implies
ct =
1 ct+1 kt+1
αβ yt+1
149
A. W. Richter
10.3. ADDITIONAL EXAMPLES
Again guess that ct = θyt . Then the Euler equation becomes
θyt =
1 θyt+1 kt+1
,
αβ yt+1
which clearly implies
kt+1 = αβyt
ct = (1 − αβ)yt ,
which is the same solution as defined in system (10.17).
10.3 Additional Examples
10.3.1 Model 1: Inelastic Labor, Capital Adjustment Costs
max
∞
X
β t ln ct
i=0
subject to ct + it = Aktα
kt+1 = kt1−δ iδt
k0 given
Bellman Equation: Guess and Verify
V (kt ) = max {ln(Aktα − it ) + βV (kt+1 )}
it ,kt+1
= max{ln(Aktα − it ) + βV (kt1−δ iδt )}
it
Guess V (kt ) = E + F ln(kt ) to obtain
V (kt ) = max {ln(Aktα − it ) + β(E + F ln(kt+1 ))}
it ,kt+1
= max {ln(Aktα − it ) + βE + βF [(1 − δ) ln kt + δ ln(it )]}
it ,kt+1
The first order condition reads
βδF
1
=
α
Akt − it
it
→
it =
βδF Aktα
1 + βδF
Plugging the result back into the value function gives
βδF Aktα
Aktα
+ βE + βF (1 − δ) ln kt + βδF ln
E + F ln kt = ln
1 + βδF
1 + βδF
150
A. W. Richter
10.3. ADDITIONAL EXAMPLES
Equating coefficients yields
F = α + β(1 − δ)F + αβδF → F =
Plugging back into the first order condition, we obtain
it =
α
1 − β(1 − δ) − αβδ
αβδ
αβδAktα
=
yt
1 − β(1 − δ)
1 − β(1 − δ)
Using the budget constraint we have
1 − β(1 − δ) − αβδ
ct =
yt
1 − β(1 − δ)
Lagrangian Method: Euler Equation Iteration
L=
∞
X
t=0
n
o
β t ln ct + λt (Aktα − ct − it ) + µt (kt1−δ iδt − kt+1 )
The first order conditions give
1
= λt
ct :
ct
it :
kt+1 :
δµt kt+1
it
it
δct kt+1
αβλt+1 yt+1 β(1 − δ)µt+1 kt+2
α−1
−δ δ
+
µt = βλt+1 αAkt+1
+ βµt+1 (1 − δ)kt+1
it+1 =
kt+1
kt+1
λt = µt kt1−δ δitδ−1 =
→
µt =
Combining yields
it
yt+1
it+1
= αβδ
+ β(1 − δ)
ct
ct+1
ct+1
Using the budget constraint and a bit of algebra gives
ct + it
= 1 − β(1 − δ) + [αβδ + β(1 − δ)]
ct
ct+1 + it+1
ct+1
Iterating yields
∞
X
1 − β(1 − δ)
yt
[αβδ + β(1 − δ)]i =
= [1 − β(1 − δ)]
ct
1 − αβδ − β(1 − δ)
i=0
Thus, using the budget constraint, we obtain the same solution path.
10.3.2 Model 2: Elastic Labor, Capital Adjustment Costs
max E0
∞
X
β t (ln ct + γℓt )
i=0
σ
subject to ct + xt = yt ≡ At kt−1
n1−σ
t
1−δ
kt = h(xt , kt−1 ) ≡ xδt kt−1
where yt represents output, nt measures labor hours worked, At is a random TFP shock, and h(t)
is the law of motion for the capital stock.
151
A. W. Richter
10.3. ADDITIONAL EXAMPLES
Lagrangian Method: Iteration of Euler Equation
L=
∞
X
t=0
n
o
σ
δ 1−δ
n1−σ
−
c
−
x
)
+
µ
(x
k
−
k
)
β t ln ct + γℓt + λt (At kt−1
t
t
t t t−1
t
t
The first order conditions give
1
= λt
ct
σ
γ = λt At kt−1
(1 − σ)n−σ
t
1−δ
λt = µt δxtδ−1 kt−1
δ
−δ
µt = βλt+1 At+1 σktσ−1 n1−σ
t+1 + βµt+1 xt+1 (1 − δ)kt
Combining these results yields
(1 − σ)yt
ct nt
xt
µt =
δct h(t)
βσyt+1 β(1 − δ)µt+1 h(t + 1)
+
µt =
ct+1 kt
kt
γ=
Further simplification gives
xt
βσyt+1 β(1 − δ)xt+1
=
+
δct h(t)
ct+1 kt
δct+1 kt
Multiplying by kt and using the fact that kt = h(t) in equilibrium, we obtain
βσyt+1 β(1 − δ)xt+1
xt
=
+
δct
ct+1
δct+1
A bit of algebra then yields
xt + ct
= 1 − β(1 − δ) + βδ
ct
σ+
1−δ
δ
ct+1 + kt+1
ct+1
Iterating, we have
∞ 1−δ j
xt + ct X
(1 − β(1 − δ))
=
βδ σ +
ct
δ
j=0
=
∞
X
j=0
=
[β(σδ + 1 − δ)]j (1 − β(1 − δ))
1 − β(1 − δ)
≡η
1 − β(σδ + 1 − δ)
Thus, we have the following solution paths
1
yt
η
xt = [1 − 1/η]yt
η(1 − σ)
nt =
γ
ct =
152
A. W. Richter
10.3. ADDITIONAL EXAMPLES
10.3.3 Model 3: Habit Persistence
max
∞
X
i=0
β t {ln ct + γ ln ct−1 }
subject to ct + kt+1 = Aktα
Bellman Equation: Iteration of Euler Equation
V (ct−1 , kt ) = max{ln ct + γ ln ct−1 + βV (ct , kt+1 )
kt+1
= max{ln(Aktα − kt+1 ) + γ ln ct−1 + βV (Aktα − kt+1 , kt+1 )}
kt+1
The first order condition gives
1
+ βV1 (ct , kt+1 ) = βV2 (ct , kt+1 )
Aktα − kt+1
and the envelope conditions yield
αAktα−1
+ αβV1 (ct , kt+1 )Aktα−1
Aktα − kt+1
γ
V1 (ct−1 , kt ) =
ct−1
V2 (ct−1 , kt ) =
Combining these results and updating yields
V2 (ct , kt+1 ) =
α(1 + βγ)yt+1
ct+1 kt+1
Plugging this back into the first order condition gives
1
αβyt+1
=
ct
ct+1 kt+1
which implies that
ct + kt+1
= 1 + αβ
ct
ct+1 + kt+2
ct+1
Thus, we have the following solutions
ct = (1 − αβ)yt
kt+1 = αβyt
153
Chapter 11
Linear Discrete Time Models
11.1 Analytical Solution Methods
11.1.1 Model Setup
Consider a standard growth model where the consumer seeks to maximize
E0
∞
X
β t log(Ct )
t=0
subject to
Ct + Kt = (1 − τt )Rt Kt−1 + Tt + Πt ,
where τ is a proportional tax levied on income, R is the gross rental price of capital, and T is
lump sum transfers from the government, and Πt are the representative firm’s profits that have been
rebated back to the consumer. The first order conditions are given by
1
= λt
Ct
λt = βEt λt+1 (1 − τt+1 )Rt+1 ,
where λ is the Lagrange multiplier on the consumer’s budget constraint. Combining these results
yields the following Euler equation
1
1
= βEt
(1 − τt+1 )Rt+1 .
(11.1)
Ct
Ct+1
The firm maximizes profits given by
Yt − Rt Kt−1
subject to the following output constraint
α
Yt = At Kt−1
,
where Y is output and A is an exogenous i.i.d mean zero technology shock. The firm’s first order
condition then implies
α−1
Rt = αAt Kt−1
.
154
(11.2)
A. W. Richter
11.1. ANALYTICAL SOLUTION METHODS
Combining (11.1) and (11.2) yields
1
1
α−1
= βEt (1 − τt+1 )
αAt+1 Kt
Ct
Ct+1
1 Yt+1
= βEt (1 − τt+1 )
.
Ct+1 Kt
(11.3)
The aggregate resource constraint is given by
α
Ct + Kt = Yt = At Kt−1
,
(11.4)
which is verified by combining the consumer’s budget constraint and the government’s budget constraint
Tt = τt Rt Kt−1
and noting that Πt = (1 − α)Yt .
11.1.2 Log-Linear System
Log-linearizing (11.3) yields
−
A
K α−1
1
α−2
Cc
=
αβ(1
−
τ
)
(α
−
1)K
Kk
+
αβ(1
−
τ
)
AEt at+1
t
t
C2
C
C
1
AK α−1
− αβ(1 − τ )AK α−1 2 CEt ct+1 − αβ
τ Et τt+1
C
C
τ
AK α−1
Et τt+1 ,
(α − 1)kt + Et at+1 − Et ct+1 −
= αβ(1 − τ )
C
1−τ
which after making use of its steady state condition reduces to
−ct = (α − 1)kt + Et at+1 − Et ct+1 −
τ
Et τ̂t+1 .
1−τ
Linearizing (11.4) gives
Cct + Kkt = AαK α−1 Kkt−1 + K α Aat
= AK α−1 K(αkt−1 + at )
K
(αkt−1 + at ),
=
αβ(1 − τ )
where the last line follows from imposing steady state on (11.3). Using the fact that
C
1
1 − αβ(1 − τ )
= AK α−1 − 1 =
−1=
,
K
αβ(1 − τ )
αβ(1 − τ )
the above result can be rewritten as
1
C
ct =
(αkt−1 + at ) − kt
K
αβ(1 − τ )
1 − αβ(1 − τ )
1
→
(αkt−1 + at ) − kt
ct =
αβ(1 − τ )
αβ(1 − τ )
155
(11.5)
A. W. Richter
11.1. ANALYTICAL SOLUTION METHODS
→ ct =
α
αβ(1 − τ )
1
at +
kt−1 −
kt
1 − αβ(1 − τ )
1 − αβ(1 − τ )
1 − αβ(1 − τ )
(11.6)
Thus, the forecast error in consumption can be written
α
αβ(1 − τ )
1
(at − Et at+1 ) +
(kt−1 − kt ) −
(kt − Et kt+1 )
1 − αβ(1 − τ )
1 − αβ(1 − τ )
1 − αβ(1 − τ )
1
α + αβ(1 − τ )
α
αβ(1 − τ )
=
(at − Et at+1 ) +
kt−1 −
kt +
Et kt+1 .
1 − αβ(1 − τ )
1 − αβ(1 − τ )
1 − αβ(1 − τ )
1 − αβ(1 − τ )
ct − Et ct+1 =
Rearranging (11.5), the forecast error in consumption can be equivalently written as
ct − Et ct+1 =
τ
Et τt+1 + (1 − α)kt − Et at+1 .
1−τ
Combining the previous two results and multiplying by the coefficient on Et kt+1 yields
1 − αβ(1 − τ ) τ
[1 − αβ(1 − τ )](1 − α)
1 − αβ(1 − τ )
Et τt+1 +
kt −
Et at+1
αβ
1−τ
αβ
αβ
1
α + αβ(1 − τ )
1
(at − Et at+1 ) + kt−1 −
kt + (1 − τ )Et kt+1
=
αβ
β
αβ
which, after simplifying, implies
(1 − τ )Et kt+1 −
1 + α2 β(1 − τ )
1
1 − αβ(1 − τ ) τ
1
kt + kt−1 =
Et τt+1 + (1 − τ )Et at+1 −
at .
αβ
β
αβ
1−τ
αβ
After dividing by 1 − τ yields
Et kt+1 − (θ
−1
+ α)kt + αθ
−1
kt−1 = θ
−1
(1 − θ)
τ
1−τ
Et τt+1 + Et [at+1 − θ −1 at ]
where θ ≡ αβ(1 − τ ). Written more compactly,we obtain
Et kt+1 − γ0 kt + γ1 kt−1 = ν2 at + ν1 Et τt+1 ≡ Et xt ,
where
γ0 = θ
−1
+α
ν1 = θ
γ1 = αθ −1
−1
τ
(1 − θ)
1−τ
ν2 = −θ −1 .
Thus, the equilibrium is characterized by a second order difference equation in capital.
11.1.3 Solution Method I: Direct Approach
Using lag operators, (11.7) can be written as
Et xt = (L−2 − γ0 L−1 + γ1 )Et kt−1
= (λ1 − L−1 )(λ2 − L−1 )Et kt−1 ,
where
γ1 = λ1 λ2 = αθ −1
γ0 = λ1 + λ2 = θ −1 + α.
156
(11.7)
A. W. Richter
11.1. ANALYTICAL SOLUTION METHODS
Thus, it is easy to see that λ1 = α < 1 and λ2 = θ −1 = [αβ(1 − τ )]−1 > 1. Inverting the unstable
factor in the above equation then yields
kt = λ1 kt−1 − (λ2 − L−1 )−1 Et xt
λ−1
2
Et xt
1 − (λ2 L)−1
∞
X
(λ2 L)−j Et xt
= λ1 kt−1 − λ−1
2
= λ1 kt−1 −
j=0
∞
X
[αβ(1 − τ )]j Et xt+j
= αkt−1 − αβ(1 − τ )
j=0
= αkt−1 −
∞
X
j=0
j+1
[αβ(1 − τ )]
= αkt−1 + at − (1 − θ)
Et θ
τ
1−τ
−1
X
∞
τ
(1 − θ)
1−τ
τt+1+j − θ
−1
at+j
θ j Et τt+1+j .
j=0
11.1.4 Solution Method II: MSV Approach
To focus attention on the technique, assume τt = 0 for all t. Then (11.7) reduces to
1
1
1
+ α kt + kt−1 = − at .
Et kt+1 −
αβ
β
αβ
Instead of solving this difference equation directly, posit a Minimum State Variable (MSV) solution
of the form
kt = b1 kt−1 + b2 at ,
where b1 and b2 are unknown coefficients that we will have to pin down. After plugging our guess
into the difference equation, we arrive at
[b1 − (αβ)−1 − α][b1 kt−1 + b2 at ] + β −1 kt−1 = −(αβ)−1 at .
Equating the coefficients on the state variables yields
[b1 − α][b1 − (αβ)−1 ] = 0
[b1 − (αβ)−1 − α]b2 = −(αβ)−1 .
Using the stable root, it is easy to see that b1 = α and b2 = 1. Therefore,
kt+1 = αkt + at+1 ,
which is the same solution we arrive at using Jordan decomposition. In order to obtain the path of
consumption, once again posit an MSV solution given by
ct = d1 kt−1 + d2 at ,
where d1 and d2 are unknown coefficients that we will have to pin down. After plugging in our
guess into (11.5), we arrive at
d1 kt−1 + d2 at − d1 [αkt−1 + at ] = (1 − α)[αkt−1 + at ].
After equating coefficients on the state variables, it is easy to see that d1 = α and d2 = 1. Therefore,
ct+1 = αkt + at+1 ,
157
A. W. Richter
11.2. NUMERICAL SOLUTION METHOD
11.2 Numerical Solution Method
11.2.1 Introduction to Gensys
Gensys is user written Matlab code that is designed to solve stochastic linear rational expectations
models. The key to using this program is to map your model into the form
G0 Xt+1 = G1 Xt + Ψεt+1 + Πηt+1 + C,
where X is a vector of variables (exogenous and endogenous), ε is a vector of exogenous random
variables (shocks), C is a vector of constants (often zero), and η is a vector of forecast errors whose
elements satisfy
x
ηt+1
= xt+1 − Et xt+1
for some x ∈ X. Assuming the exogenous random variables and i.i.d. mean zero, Gensys will
output the coefficients, G and M , of the following equation:
Xt+1 = GXt + M εt+1 + C.
The simplicity of using this program is illustrated in the following example.
11.2.2 Model Setup
Consider the following stochastic neoclassical growth model. A representative agent maximizes
Et
∞
X
i=0
β i U (Ct+i , 1 − Nt+i )
(11.8)
subject to
Ct + Kt − (1 − δ)Kt−1 ≤ F (Kt−1 , Nt , At ),
|
{z
}
(11.9)
It
where
U (Ct , 1 − Nt ) = ln Ct + θ ln(1 − Nt )
1−α α
Yt = F (Kt−1 , Nt , At ) = At Kt−1
Nt
(11.10)
with γ > 0, γN ≥ 0, θ ≥ 0, 0 ≤ α ≤ 1. The technology shock is assumed to be AR(1). In terms of
deviations from steady state, the technology process is
at = ρat−1 + εt ,
(11.11)
2 ).
where 0 ≤ ρ ≤ 1, ε ∼ N (0, σA
Assuming the utility function is strictly increasing, the budget constraint holds with equality. Thus,
after substituting the constraint into the objective function, the Bellman equation can be written as
1−α α
Nt − Kt + (1 − δ)Kt−1 ) + θ ln(1 − Nt ) + βEt V (Kt , At+1 ) .
V (Kt−1 , At ) = max ln(At Kt−1
Kt ,Nt
The first order conditions are given by
1
= βEt VK (Kt , At+1 , Nt+1 )
Ct
158
A. W. Richter
11.2. NUMERICAL SOLUTION METHOD
1−α α−1
αAt Kt−1
Nt
θ
=
.
Ct
1 − Nt
Making use of the envelope theorem, we obtain
VK (Kt−1 , At , Nt ) =
1
−α
[(1 − α)At Ntα Kt−1
+ (1 − δ)],
Ct
which, after being updated one period, can be substituted back into the first order condition for
capital to obtain
1
1
−α
α
= βEt
[(1 − α)At+1 Nt+1 Kt + (1 − δ)]
Ct
Ct+1
After simplifying, the first order conditions of the consumer’s problem can be written
1
1
= βEt
Rt+1
Ct
Ct+1
θ
αYt
=
1 − Nt
C t Nt
Yt
+ (1 − δ).
Rt = (1 − α)
Kt−1
(11.12)
(11.13)
(11.14)
Equations (11.12)-(11.14), the production function, the aggregate resource constraint, and the technology process make up a system of 6 equations and 6 variables, which can be solved numerically
using Gensys.
11.2.3 Deterministic Steady State
Obtaining steady state values can sometimes be tricky. It is important to work from the simplest
steady state equations to the most complicated. In this case, notice that (11.12) implies
R = 1/β.
Using (11.14) we can calculate K/Y = (1 − α)(R + δ − 1)−1 . Thus, using the aggregate resource
constraint, C/Y = 1 − δK/Y . Using the production function,
N
=
Y
K
Y
(α−1)/α
.
Setting N = 1/3, which is consistent with a standard work day, from (11.13)
−1 C
1−N
θ=α
.
Y
N
Using N and N/Y we can calculate steady state Y and hence C and K. Assuming the following
parameter settings,
α = .666
β = .99,
δ = .1,
straightforward numerical calculations imply
(K, N, C, Y ) = (1.82, 0.33, 0.41, 0.59)
159
A. W. Richter
11.2. NUMERICAL SOLUTION METHOD
11.2.4 Log-Linear System
Log linearizing (11.12)-(11.14), the aggregate resource constraint, and the production function, we
obtain
−ct = −Et ct+1 + Et rt+1
1
nt = yt − ct
1−N
Cct + Kkt = Y yt + (1 − δ)Kkt−1
yt = at + (1 − α)kt−1 + αnt
Rrt = (1 − α)(Y /K)(yt − kt−1 ),
where a lower case letter denotes log deviations from the deterministic steady state. Introducing
forecast errors, the linearized consumption euler equation can be written
c
r
ct+1 − rt+1 = ct + ηt+1
− ηt+1
11.2.5 Gensys Form
Putting the linearized system into matrix form, we obtain
G0 Xt+1 = G1 Xt + Ψεt+1 + Πηt+1 ,
where




G0 = 







G1 = 




1
0
0
0
−1 0
−1 0 −1/(1 − N )
1
0
0 

C K
0
−Y
0
0 

1
0
−α
0
0 −1 

0
0
0
−(1 − α)(Y /K) R
0 
0
0
0
0
0
1

1
0
0 0 0 0
0
0
0 0 0 0 

0
(1 − δ)K
0 0 0 0 

0
1−α
0 0 0 0 

0 −(1 − α)(Y /K) 0 0 0 0 
0
0
0 0 0 ρ
and




Π=




1 −1
0 0 

0 0 

0 0 

0 0 
0 0




Ψ=



0
0
0
0
0
1









Xt+1



=



ct+1
kt+1
nt+1
yt+1
rt+1
at+1








Now that we have setup the appropriate matrices, all that remains is to simply enters the matrices
row by row into Matlab.
160
A. W. Richter
11.2. NUMERICAL SOLUTION METHOD
11.2.6 Gensys: Behind the Code
In order to more clearly understand the techniques applied by the program, recall the conventional
growth model that was introduced in chapter 11. Assuming τt for all t and at is i.i.d with mean zero,
equations (11.5) and (11.6) can be written as
−ct = (α − 1)kt − Et ct+1
α
αβ
1
at +
kt−1 −
kt .
ct =
1 − αβ
1 − αβ
1 − αβ
Defining the forecast error as ηt+1 = ct+1 − Et ct+1 , the above system is given by
#
# "
"
1
α−1
1
0
ct+1
ct
= αβ−1 1−αβ(1−α)
+ αβ−1 ηt+1 + 1 at+1 ,
kt+1
kt
αβ
αβ
αβ
αβ
which can be written more compactly as
xt+1 = Axt + Bξt+1 ,
(11.15)
where A is the coefficient matrix in the above system and
xt = (ct , kt )′
ξt = (at , ηt )′
B=
"
0
1
1
αβ
αβ−1
αβ
#
.
Solving for the zeros of the characteristic equation, det(A − λI) = 0, implies that the eigenvalues
are given by λ1 = α < 1 and λ2 = (αβ)−1 . In order to solve for the corresponding eigenvectors,
vi = [v1i v2i ], evaluate the equation Avi = λvi for each eigenvalue i = 1, 2. That is for λ = α
# "
#
"
1−α
α−1
1 − α α − 1 rref 1 −1
= αβ−1 1−αβ
∼
.
A − αI = αβ−1 1−αβ(1−α)
0 0
−α
αβ
αβ
αβ
αβ
Thus, normalizing v12 = 1 implies that the eigenvector corresponding to λ1 is v1 = [1 1]′ . For
λ = (αβ)−1
#
"
# "
"
#
αβ−1
−1
αβ(α−1)
1
−
(αβ)
α
−
1
α
−
1
rref
1
αβ
αβ−1
A − (αβ)−1 I =
= αβ−1
∼
.
1−αβ(1−α)
αβ−1
1
−
α
−
1
0
0
αβ
αβ
αβ
αβ
Thus, normalizing v22 = 1 implies that the eigenvector corresponding to λ2 is v2 = [αβ(α −
1)/(1 − αβ) 1]′ . Making use of the Jordan decomposition of the coefficient matrix A, A =
W ΛW −1 and we will denote W·j as the j th column of matrix W and W j· as the j th row of matrix
W −1 . Given the above results,
#
"
#
"
αβ(1−α)
1−αβ
1 αβ(α−1)
2
2
−1
β
1−α β
1−αβ
W = 1−α
.
W =
1−αβ
αβ−1
1
1
2
α β−1
1−α2 β
Iterating (11.15) backwards yields
xt = At x0 +
t−1
X
s=0
161
As Bξt−s .
A. W. Richter
11.2. NUMERICAL SOLUTION METHOD
Applying the Jordan Decomposition, we obtain
xt =
2
X
j=1
W·j λtj W j· x0 +
2
X
j=1
W·j
t−1
X
λsj W j· Bξt−s l
s=0
To eliminate the influence of explosive eigenvalues (|λj | > 1), we need to impose that for each
explosive eigenvalue
W j·xt = 0,
t = 0, 1, 2, . . .
or, equivalently,
W j· x0 = 0.
As well as
W j· Bξt = 0
t = 0, 1, 2, . . .
The first condition places the solution onto the stable manifold, while the second condition ensures
that exogenous shocks affect endogenous forecast errors in a way that keeps the solution on the
stable manifold.
1
In this model, |λ1 | < 1 and |λ2 | > 1. Thus, we need to squash the influence of λ2 = αβ
by
setting
i c h
1−αβ
αβ−1
t
2·
= 0,
W xt = α2 β−1 1−α2 β
kt
which implies that ct = kt . Also,
2·
W Bξt =
h
αβ−1
α2 β−1
1−αβ
1−α2 β
i
"
0
1
αβ
# at
= 0,
αβ−1
ηt
αβ
1
implying that ηt = at . Given the above result, we know ct+1 − kt+1 = 0 and ηt+1 = at+1 . Thus,
the system can be rewritten as
1 0
ct+1
1 α − 1 ct
1
0
=
+
at+1 +
a ,
1 −1 kt+1
0
0
kt
0
0 t+1
which yields the solution
kt+1 = αkt + at+1
ct+1 = αkt + at+1 .
Given that
ct+1 − Et ct+1 ≡ ηt+1 = at+1 ,
Gensys will return the solution in the form xt+1 = Gxt + M at+1 , where
0 α
1
G=
M=
.
0 α
1
162
Chapter 12
Introduction to Continuous Time
Models
12.1 Maximum Principle
The typical continuous time optimization problem is given by
Z ∞
e−ρt f (s, k, α)dt
max
c
t=0
subject to k̇ = gi (s, k)
i = 1, . . . , n
where ρ is the time preference rate, α is some exogenous variable, s = (s1 . . . , sm ) is a vector of
control variables, and k = (k1 . . . , kn ) is a vector of state variables.
Dynamic problems in discrete time are typically solved using a Lagrangian. A similar approach
can be utilized to solve dynamic problems in continuous time by forming the following Hamiltonian:
)
(
n
X
φi gi (s, k)
H = e−ρt f (s, k, α) +
i=1
where φi are current value multipliers. Alternatively, the Hamiltonian could have been formulated
as follows:
n
X
−ρt
φ̃i gi (s, k)
H = e f (s, k, α) +
i=1
where φ̃i =
be
e−ρt φi
is the present value multiplier. The current value Hamiltonian is then defined to
Hc ≡ eρt H = f (s, k, α) +
n
X
φi gi (s, k)
i=1
In general, the multipliers can be viewed as shadow prices. Specifically, they measure the incremental change in the value of the objective function from an exogenous increase in state variable
ki . According to the maximum principle, we have the following first order conditions
Hs j = 0
→
fsj (·) +
n
X
φi gsi j (·) = 0 j = 1 . . . , m
i=1
163
(12.1)
A. W. Richter
12.2. RAMSEY PROBLEM
Hs j s j ≤ 0
∂H
∂ φ̃i
=
→
fsj sj (·) +
i=1
∂H
= k̇i
∂(e−ρt φi )
d(e−ρt φi )
= −Hka
dt
n
X
φi gsi j sj (·) ≤ 0
k̇i = g i (s, k)
→
φ̇a = ρφa − fka (·) −
→
j = 1...,m
i = 1, . . . , n
n
X
φi gki a
a = 1, . . . , n
(12.2)
(12.3)
(12.4)
i=1
Equations (12.1) and (12.2) are respectively the first and second derivatives of the Hamiltonian with
respect to all control variables. These conditions are analogous to the necessary and sufficient conditions of a typical optimization problem. Equation (12.3) simply recovers the constraints. Equation
(12.4) equates the time derivative of the present value multiplier to the negative of the partial of the
Hamiltonian with respect to all the state variables. Given the interpretation of the multiplier, this
condition says that along the optimal path the return on asset ka inclusive of capital gains, φ̇a /φa
should equal the discount factor ρ.
12.2 Ramsey Problem
12.2.1 Model Setup
Consider the following problem
Z
max
c
∞
e−ρt ln(ct )dt
t=0
subject to k̇ = k(t)α − c(t) − δk(t)
Then the present value Hamiltonian is given by
H = e−ρt {ln c(t) + φ(t) [k(t)α − c(t) − δk(t)]} ,
(12.5)
where φ is the current value multiplier associated with the state variable k. Using the maximum
principle, we have the following first order conditions
Hc = 0
1
= φ(t)
c(t)
→ φ̇(t) = φ ρ + δ − αk(t)α−1
→
d −ρt
(e φ) = Hk
dt
∂H
= k̇ →
∂(e−ρt φi )
k̇ = k(t)α − c(t) − δk(t)
(12.6)
(12.7)
(12.8)
Apply time derivatives to (12.6) to obtain
−
1
ċ = φ̇
c(t)2
Substitute the co-state and use (12.6) equation to obtain
ċ = c(t) αk(t)α−1 − δ − ρ
In order to create a flow diagram, we can conduct the following analysis.
164
(12.9)
A. W. Richter
12.2. RAMSEY PROBLEM
Figure 12.1: Ramsey Model Phase Diagram
c


Equation (12.9) implies that





α
δ+ρ




1
1−α
ċ
c(t)
= 0, if k(t) = k̄;
ċ
c(t)
> 0, if k(t) < k̄;
ċ
c(t)
< 0, if k(t) > k̄
(12.10)
follows from evaluating the Euler equation at steady state (i.e. ċ = 0).
where k̄ ≡
The budget constraint, (12.8), implies that


k̇ = 0, if c(t) = k(t)α − δk(t);


(12.11)
k̇ > 0, if c(t) < k(t)α − δk(t);



k̇ < 0, if c(t) > k(t)α − δk(t)
1
Notice that k(t)α − δk(t) reaches its maximum at k∗ = αδ 1−α > k̂. This fact, along with
relationships (12.10) and (12.11), yields the following phase diagram.
12.2.2 Linear System
Linearizing the equilibrium conditions, we obtain
ċ = (αk̄α−1 − δ − ρ)dc + α(α − 1)c̄k̄α−2 dk
k̇ = −dc + (αk̄α−1 − δ)dk
where the differential, dx ≡ x(t) − x̄, represents deviation from steady state. Using the fact that in
steady state
k̄ =
c̄ =
α
ρ+δ
α
ρ+δ
1/(1−α)
1/(1−α) 165
ρ + δ − αδ
α
A. W. Richter
12.2. RAMSEY PROBLEM
the linearized solution set becomes,
(α − 1)(ρ + δ)(ρ + δ − αδ)
ċ = 0dc +
dk
α
k̇ = −dc + ρdk
Thus the linearized system has the form
A
z
ċ
0
=
k̇
−1
}|
(α−1)(ρ+δ)(ρ+δ−αδ)
α
ρ
{ dc
dk
In order to conduct stability analysis, we must first calculate the eigenvalues of A.
det(A − λI) = −ρλ + λ2 +
(α − 1)(ρ + δ)(ρ + δ − αδ) set
=0
α
Thus, the eigenvalues are given by
λ1,2
ρ
= ±
2
r
ρ2 (α − 1)(ρ + δ)(ρ + δ − αδ)
−
.
4
α
Since the second term is larger than the first term, λ1 < 0 and λ2 > 0. Hence, the solution is a
saddle point.
166
Bibliography
A DDA , J.
AND
R. C OOPER (2003): Dynamic Economnics, Cambridge, MA: The MIT Press.
BARRO , R. J. AND X. S ALA - I -M ARTIN (2003): Economic Growth, Cambridge, MA: The MIT
Press, 2nd ed.
B EAVIS , B. AND I. M. D OBBS (1990): Optimization and Stability Theory for Economic Analysis,
Cambridge, MA: Cambridge University Press.
B LANCHARD , O.
MIT Press.
AND
S. F ISCHER (1989): Lectures on Macroeconomics, Cambridge, MA: The
B ROWN , J. AND R. C HURCHILL (2008): Complex Variables and Applications, New York, NY:
McGraw-Hill, 8th ed.
C HIANG , A. C. AND K. WAINWRIGHT (2004): Fundamental Methods of Mathematical Economics, New York, NY: McGraw-Hill, 4th ed.
F UENTE , A. (2000): Mathematical Methods and Models for Economists, Cambridge, MA:
Cambridge University Press.
DE LA
D IXIT, A. K. (1990): Optimization in Economic Theory, New York, NY: Oxford University Press,
2nd ed.
E DWARDS , C. H. AND D. E. P ENNEY (2008): Differential Equations and Linear Algebra, Upper
Saddle River, NJ: Prentice Hall, 3rd ed.
F RIEDBERG , S. H., A. J. I NSEL ,
NJ: Prentice Hall, 4th ed.
AND
L. E. S PENCE (2002): Linear Algebra, Upper Saddle River,
F RYER , M. J. AND J. V. G REENMAN (1987): Optimisation Theory: Applications in OR and Economics, Baltimore, MD: Edward Arnold.
H AMILTON , J. D. (1994): Time Series Analysis, Princeton, NJ: Princeton University Press.
K. S YDSÆTER , K., A. S TROM ,
York, NY: Springer, 4th ed.
AND
P. B ERCK (2005): Economists Mathematical Manual, New
L AY, S. R. (2004): Analysis with an Introduction to Proof, Upper Saddle River, NJ: Prentice Hall,
4rd ed.
L JUNGQVIST, L. AND T. J. S ARGENT (2004): Recirsive Macroeconomic Theory, Cambridge, MA:
The M.I.T. Press, 2nd ed.
167
A. W. Richter
BIBLIOGRAPHY
M C C ANDLESS , G. (2008): The ABCs of RBCs: An Introduction to Dynamic Macroeconomic Models, Cambridge, MA: Harvard University Press.
N OVSHEK , W. (1993): Mathematics for Economists, Cambridge, MA: Emerald Group Publishing
Limited.
P EMBERTON , M. AND N. R AU (2007): Mathematics for Economists: An introductory Textbook,
Manchester, UK: Manchester University Press, 2nd ed.
S ARGENT, T. J. (1987): Macroeconomic Theory, Cambridge, MA: Emerald Group Publishing
Limited, 2nd ed.
S IMON , C. P. AND L. B LUME (1994): Mathematics for Economists, New York, NY: W. W. Norton
and Company.
S TEWART, J. (2011): Calculus: Early Transcendentals, Belmont, CA: Thomson Learning, 7th ed.
S UNDARAM , R. K. (1996): A First Course in Optimization Theory, Cambridge, MA: Cambridge
University Press.
S YDSÆTER , K. AND P. H AMMOND (2002): Essential Mathematics for Economic Analysis, Upper
Saddle River, NJ: Prentice Hall.
S YDSÆTER , K., P. H AMMOND , A. S EIERSTAD , AND A. K. S TROM (2005): Further Mathematics
for Economic Analysis, Upper Saddle River, NJ: Prentice Hall.
T URKINGTON , D. A. (2007): Mathematical Tools for Economists, Malden, MA: Blackwell Publishing.
W ICKENS , M. (2012): Macroeconomic Theory: A Dynamic General Equilibrium Approach,
Princeton, NJ: Princeton University Press, 2nd ed.
168