An Introduction to Mathematical Economics Part 1 U Q2 Loglinear Publishing Q1 Michael Sampson Copyright © 2001 Michael Sampson. Loglinear Publications: http://www.loglinear.com Email: [email protected]. Terms of Use This document is distributed "AS IS" and with no warranties of any kind, whether express or implied. Until November 1, 2001 you are hereby given permission to print one (1) and only one hardcopy version free of charge from the electronic version of this document (i.e., the pdf file) provided that: 1. The printed version is for your personal use only. 2. You make no further copies from the hardcopy version. In particular no photocopies, electronic copies or any other form of reproduction. 3. You agree not to ever sell the hardcopy version to anyone else. 4. You agree that if you ever give the hardcopy version to anyone else that this page, in particular the Copyright Notice and the Terms of Use are included and the person to whom the copy is given accepts these Terms of Use. Until November 1, 2001 you are hereby given permission to make (and if you wish sell) an unlimited number of copies on paper only from the electronic version (i.e., the pdf file) of this document or from a printed copy of the electronic version of this document provided that: 1. You agree to pay a royalty of either $3.00 Canadian or $2.00 US per copy to the author within 60 days of making the copies or to destroy any copies after 60 days for which you have not paid the royalty of $3.00 Canadian or $2.00 US per copy. Payment can be made either by cheque or money order and should be sent to the author at: Professor Michael Sampson Department of Economics Concordia University 1455 de Maisonneuve Blvd W. Montreal, Quebec Canada, H3G 1M8 2. If you intend to make five or more copies, or if you can reasonably expect that five or more copies of the text will be made then you agree to notify the author before making any copies by Email at: [email protected] or by fax at 514-8484536. 3. You agree to include on each paper copy of this document and at the same page number as this page on the electronic version of the document: 1) the above Copyright Notice, 2) the URL: http://www.loglinear.com and the Email address [email protected]. You may then if you wish remove this Terms of Use from the paper copies you make. Contents Preface 1 The 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 v Mathematical Method De…nitions . . . . . . . . . . . . . . . . . . . . . The Di¤erence Between ‘ = ’ and ‘´ ’ . . . . . Implication . . . . . . . . . . . . . . . . . . . . Negation . . . . . . . . . . . . . . . . . . . . . . Proof by Contradiction . . . . . . . . . . . . . . Necessary Conditions and Su¢cient Conditions Necessary and Su¢cient Conditions . . . . . . ‘Or’ and ‘And’ . . . . . . . . . . . . . . . . . . The Quanti…ers 9 and 8 . . . . . . . . . . . . . Proof by Counter-Example . . . . . . . . . . . Proof by Induction . . . . . . . . . . . . . . . . Functions . . . . . . . . . . . . . . . . . . . . . 1.12.1 Integer Exponents . . . . . . . . . . . . 1.12.2 Polynomials . . . . . . . . . . . . . . . . 1.12.3 Non-integer Exponents . . . . . . . . . . 1.12.4 The Geometric Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 5 5 6 8 9 10 10 11 13 15 16 19 21 2 Univariate Calculus 2.1 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Slopes . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Derivatives . . . . . . . . . . . . . . . . . . . . 2.1.3 The Use of the Word ‘Marginal’ in Economics . 2.1.4 Elasticities . . . . . . . . . . . . . . . . . . . . 2.1.5 The Constant Elasticity Functional Form . . . 2.1.6 Local and Global Properties . . . . . . . . . . . 2.1.7 The Sum, Product and Quotient Rules . . . . . 2.1.8 The Chain Rule . . . . . . . . . . . . . . . . . 2.1.9 Inverse Functions . . . . . . . . . . . . . . . . . 2.1.10 The Derivative of an Inverse Function . . . . . 2.1.11 The Elasticity of an Inverse Function . . . . . . 2.2 Second Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 23 23 24 27 28 31 32 34 37 39 42 43 45 i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS 2.3 2.4 2.5 2.6 2.7 2.8 ii 2.2.1 Convexity and Concavity . . . . . . . . . . . . . . . . . . 2.2.2 Economics and ‘Diminishing Marginal ...’ . . . . . . . . . Maximization and Minimization . . . . . . . . . . . . . . . . . . . 2.3.1 First-Order Conditions . . . . . . . . . . . . . . . . . . . . 2.3.2 Second-Order Conditions . . . . . . . . . . . . . . . . . . 2.3.3 Su¢cient Conditions for a Global Maximum or Minimum 2.3.4 Pro…t Maximization . . . . . . . . . . . . . . . . . . . . . Econometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Least Squares Estimation . . . . . . . . . . . . . . . . . . 2.4.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . Ordinal and Cardinal Properties . . . . . . . . . . . . . . . . . . 2.5.1 Class Grades . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Ordinal and Cardinal Properties of Functions . . . . . . . 2.5.3 Concavity and Convexity are Cardinal Properties . . . . . 2.5.4 Quasi-Concavity and Quasi-Convexity . . . . . . . . . . . 2.5.5 New Su¢cient Conditions for a Global Maximum or Minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exponential Functions and Logarithms . . . . . . . . . . . . . . . 2.6.1 Exponential Growth and the Rule of 72 . . . . . . . . . . Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 The Error of the Taylor Series Approximation . . . . . . . 2.7.2 The Taylor Series for ex and ln (1 + x) . . . . . . . . . . . 2.7.3 L’Hôpital’s Rule . . . . . . . . . . . . . . . . . . . . . . . 2.7.4 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . Technical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.1 Continuity and Di¤erentiability . . . . . . . . . . . . . . . 2.8.2 Corner Solutions . . . . . . . . . . . . . . . . . . . . . . . 2.8.3 Advanced Concavity and Convexity . . . . . . . . . . . . 3 Matrix Algebra 3.1 Matrix Addition and Subtraction . . . . . . . . . . . . . . . . 3.1.1 The Matrix 0 . . . . . . . . . . . . . . . . . . . . . . . 3.2 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . 3.2.1 The Identity Matrix . . . . . . . . . . . . . . . . . . . 3.3 The Transpose of a Matrix . . . . . . . . . . . . . . . . . . . 3.3.1 Symmetric Matrices . . . . . . . . . . . . . . . . . . . 3.3.2 Proof that AT A is Symmetric . . . . . . . . . . . . . . 3.4 The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . 3.4.1 Diagonal Matrices . . . . . . . . . . . . . . . . . . . . 3.5 The Determinant of a Matrix . . . . . . . . . . . . . . . . . . 3.5.1 Determinants of Upper and Lower Triangular Matrices 3.5.2 Calculating the Inverse of a Matrix with Determinants 3.6 The Trace of a Matrix . . . . . . . . . . . . . . . . . . . . . . 3.7 Higher Dimensional Spaces . . . . . . . . . . . . . . . . . . . 3.7.1 Vectors as Points in an n Dimensional Space: <n . . . 3.7.2 Length and Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 48 49 49 51 52 55 60 60 64 67 67 68 69 70 71 73 82 84 86 89 90 91 94 94 96 97 101 103 104 105 109 110 111 112 112 116 117 122 124 126 127 127 128 CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 134 139 142 143 143 146 148 151 152 154 154 155 157 164 166 168 172 174 176 179 179 180 184 4 Multivariate Calculus 4.1 Functions of Many Variables . . . . . . . . . . . . . . . . 4.2 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . 4.2.1 The Gradient . . . . . . . . . . . . . . . . . . . 4.2.2 Interpreting Partial Derivatives . . . . . . . . . . 4.2.3 The Economic Language of Partial Derivatives . 4.2.4 The Use of the Word Marginal . . . . . . . . . . 4.2.5 Elasticities . . . . . . . . . . . . . . . . . . . . . 4.2.6 The Chain Rule . . . . . . . . . . . . . . . . . . 4.2.7 A More General Multivariate Chain Rule . . . . 4.2.8 Homogeneous Functions . . . . . . . . . . . . . . 4.2.9 Homogeneity and the Absence of Money Illusion 4.2.10 Homogeneity and the Nature of Technology . . . 4.3 Second-Order Partial Derivatives . . . . . . . . . . . . . 4.3.1 The Hessian . . . . . . . . . . . . . . . . . . . . . 4.3.2 Concavity and Convexity . . . . . . . . . . . . . 4.3.3 First and Second-Order Taylor Series . . . . . . 4.4 Unconstrained Optimization . . . . . . . . . . . . . . . . 4.4.1 First-Order Conditions . . . . . . . . . . . . . . . 4.4.2 Second-Order Conditions . . . . . . . . . . . . . 4.5 Quasi-Concavity and Quasi-Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 188 189 192 194 196 197 199 200 203 203 206 207 207 209 212 217 218 218 223 227 3.8 3.9 3.10 3.11 3.12 3.7.3 Angle and Orthogonality . . . . . . . . . . . 3.7.4 Linearly Independent Vectors . . . . . . . . . Solving Systems of Equations . . . . . . . . . . . . . 3.8.1 Cramer’s Rule . . . . . . . . . . . . . . . . . Eigenvalues and Eigenvectors . . . . . . . . . . . . . 3.9.1 Eigenvalues . . . . . . . . . . . . . . . . . . . 3.9.2 Eigenvectors . . . . . . . . . . . . . . . . . . 3.9.3 The Relationship A = C¤C ¡1 . . . . . . . . 3.9.4 Left and Right-Hand Eigenvectors . . . . . . 3.9.5 Symmetric and Orthogonal Matrices . . . . . Linear and Quadratic Functions in <n+1 . . . . . . . 3.10.1 Linear Functions . . . . . . . . . . . . . . . . 3.10.2 Quadratics . . . . . . . . . . . . . . . . . . . 3.10.3 Positive and Negative De…nite Matrices . . . 3.10.4 Using Determinants to Check for De…niteness 3.10.5 Using Eigenvalues to Check for De…niteness . 3.10.6 Maximizing and Minimizing Quadratics . . . Idempotent Matrices . . . . . . . . . . . . . . . . . . 3.11.1 Important Properties of Idempotent Matrices 3.11.2 The Spectral Representation . . . . . . . . . Positive Matrices . . . . . . . . . . . . . . . . . . . . 3.12.1 The Perron-Frobenius Theorem . . . . . . . . 3.12.2 Markov Chains . . . . . . . . . . . . . . . . . 3.12.3 General Equilibrium and Matrix Algebra . . iii . . . . . . . . . . . . . . . . . . . . . . CONTENTS 4.5.1 Ordinal and Cardinal Properties . . . . . . . . . . . . . . 4.5.2 Su¢cient Conditions for a Global Maximum or Minimum 4.5.3 Indi¤erence Curves and Quasi-Concavity . . . . . . . . . 4.6 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . 4.6.1 The Lagrangian . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 First-Order Conditions . . . . . . . . . . . . . . . . . . . 4.6.3 Second-Order Conditions . . . . . . . . . . . . . . . . . . 4.6.4 Su¢cient Conditions for a Global Maximum or Minimum 4.7 Econometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . iv 227 229 231 236 236 240 246 251 258 258 260 Preface I would like to thank my students for struggling through earlier versions of this text. In particular I would like to thank Maxime Comeau, Bulent Yurtsever, Patricia Carvajal, Alain Lumbroso and Saif Al-Haroun for pointing out errors and typos. Here are ‘some points of view’ on economics and mathematics: It is clear that Economics, if it is to be a science at all, must be a mathematical science. -William Jevons ( Jevons was one of the early mathematical economists). There can be no question, however, that prolonged commitment to mathematical exercises in economics can be damaging. It leads to the atrophy of judgement and intuition. -John Kenneth Galbraith (Galbraith is a famous Canadian economist; an advisor to President Kennedy in the 1960’s; author of many popular books of which our former prime minister Trudeau was a big fan. Gets no respect from academic economists.) The age of chivalry is gone. That of sophisters, economists and calculators has succeeded. -Edmund Burke. I advise my students to listen carefully the moment they decide to take no more mathematics courses. They might be able to hear the sound of closing doors. -James Caballero. The e¤ort of the economist is to ‘see,’ to picture the interplay of economic elements. The more clearly cut these elements appear in his vision, the better; the more elements he can grasp and hold in his mind at once, the better. The economic world is a misty region. The …rst explorers used unaided vision. Mathematics is the lantern by which what before was dimly visible now looms up in …rm, bold outlines. The old phantasmagoria disappear. We see better. We also see further. -Irving Fisher (early 20th century US monetary v PREFACE economist, famous for the Fisher equation: nominal interest rate equals real interest rate plus the rate of in‡ation). In mathematics you don’t understand things. You just get used to them. -John von Neuman (One of the great mathematical brains of the 20th century. Famous in economics for developing game theory and for the von Neuman growth model) One of the big misapprehensions about mathematics that we perpetrate in our classrooms is that the teacher always seems to know the answer to any problem that is discussed. This gives students the idea that there is a book somewhere with all the right answers to all of the interesting questions, and that teachers know those answers. And if one could get hold of the book, one would have everything settled. That’s so unlike the true nature of mathematics. -Leon Henkin. Mathematics. - Let us introduce the re…nement and rigor of mathematics into all sciences as far as this is at all possible, not in the faith that this will lead us to know things but in order to determine our human relation to things. Mathematics is merely the means for general and ultimate knowledge of man. -Friedrich Nietzsche ( 19th century philosopher, an atheist, famous for his claim that “God is dead”. ) If we have no aptitude or natural taste for geometry, this does not mean that our faculty for attention will not be developed by wrestling with a problem or studying a theorem. On the contrary it is almost an advantage. It does not even matter much whether we succeed in …nding the solution or understanding the proof, although it is important to try really hard to do so. Never in any case whatever is a genuine e¤ort of the attention wasted. It always has its e¤ect on the spiritual plane and in consequence on the lower one of the intelligence, for all spiritual light lightens the mind. If we concentrate our attention on trying to solve a problem of geometry, and if at the end of an hour we are no nearer to doing so than at the beginning, we have nevertheless been making progress each minute of that hour in another more mysterious dimension. Without our knowing or feeling it, this apparently barren e¤ort has brought more light into the soul. The result will one day be discovered in prayer. Moreover, it may very likely be felt in some department of the intelligence in no way connected with mathematics. Perhaps he who made the unsuccessful e¤ort will one day be able to grasp the beauty of a line of Racine more vividly on account of it. But it is certain that this e¤ort will bear its fruit in prayer. There is no doubt whatever about that. -Simone Weil ( 20th century Christian mystic, her brother Andre Weil was one of the great mathematicians of the 20th century). vi Chapter 1 The Mathematical Method 1.1 De…nitions Mathematics has no symbols for confused ideas. -Anonymous “When I use a word,” Humpty Dumpty said in a rather a scornful tone, “it means just what I choose it to mean – neither more nor less.” “The question is,” said Alice, “whether you can make words mean di¤erent things.” “The question is,” said Humpty Dumpty, “which is to be master – that’s all.” -Lewis Carroll, Through the Looking Glass In economics we strive for precise thinking, and one of the ways we do this is by using mathematics. The beginning of this practice is to be clear about what we are talking about, and for this we need de…nitions. We begin with some elementary number theory in order to illustrate the mathematical methods that we will later apply to economic models. Suppose then we are interested in the properties of odd and even numbers. Now intuitively you may know that 4 is even and 5 is an odd. If however we wish to prove things about odd and even numbers, then we have to be able to de…ne what we mean by an odd and an even number. Consider then proving that the product of an odd and an even number is always an even number. It is not enough to make a list such as: 4 £ 5 = 20 2£3 = 6 12 £ 37 = 444 etc: 1 CHAPTER 1. THE MATHEMATICAL METHOD 2 and note that 20 , 6 and 444 are even numbers. This is not a proof! Nor would it be a proof to make the list even longer because there are an in…nite number of odd and even combinations. Without de…nitions we have nowhere to begin! Now one possible de…nition of even and odd numbers would be: De…nition 1 An integer m is an even number if and only if there exists an integer n such that: m = 2 £ n: De…nition 2 An integer m is an odd integer if and only if there exists an integer n such that: m = 2 £ n + 1: For example according to the de…nition 18 is an even integer because we can write it as 18 = 2 £ n where n = 9; while 5 is an odd integer because we can write it as 5 = 2 £ n + 1 where n = 2: Armed with these de…nitions we can now prove something: Theorem 3 The product of an odd and an even number is an even number. Proof. If a is even and b is odd then a = 2m and b = 2n + 1 and: a £ b = 2m £ (2n + 1) = 2 £ (m £ (2n + 1)) = 2£r where r = (m £ (2n + 1)) is an integer. Thus a£ b is an even number. Notice the power of this kind of reasoning. In a few short lines we have been able to prove a result that applies to an in…nity of numbers! This in…nity is a list of numbers which would go past the moon or even past the most distant star and yet we are able to say something quite de…nite about it. This is the magic of mathematics! 1.2 The Di¤erence Between ‘ = ’ and ‘´ ’ One day in microeconomics, the professor was writing up the typical “underlying assumptions” in preparation to explain a new model. I turned to my friend and asked, “What would Economics be without assumptions?” He thought for a moment, then replied, “Accounting.” CHAPTER 1. THE MATHEMATICAL METHOD 3 Sometimes things are equal to each other simply by de…nition. For example if A = B = “the number of bachelors in Montreal” “the number of unmarried men in Montreal” then A and B are equal to each other by de…nition. There is nothing to prove here and it says nothing about the world or Montreal. To emphasize the nature of this kind of equality we use a special kind of equal sign: ‘´’ so that for bachelors and unmarried men in Montreal we write: A ´ B: This says then that A and B are equal by de…nition or that this is an accounting identity. When you see this equality sign you can relax! There is nothing to prove, these things are merely di¤erent notations that mean the same thing. In economics a good example of an accounting identity is the GN P identity you learn in macroeconomics: Y ´C +I +G+X ¡M where Y is GN P; C is consumption, I is investment, G is government expenditure, X is exports and M is imports. On the other hand, sometimes things are equal in a more important way. For example E = mc2 expresses an important fact in physics while f (x) = x2 and f 0 (x) = 2x give us real information about the function f (x) : In these cases we use = as a way of emphasizing that real information is being provided. 1.3 Implication In mathematical economics we begin with assumptions and from there attempt to deduce true implications of these assumptions. Fundamental to this kind of reasoning is the idea of logical implication; that if A is true then it follows that B must also be true. We write this formally as: A =) B; which is to say that A implies B:1 Example 1: If A = B = 1 Sometimes “Mr. Smith lives in Montreal” “Mr. Smith lives in the province of Quebec” you will see the notation: A ¾ B instead. CHAPTER 1. THE MATHEMATICAL METHOD 4 then since the city of Montreal is in the province of Quebec it follows that A =) B: We are often will often be attempting to construct proofs of statements like: A =) B: Often the link between A and B is not obvious and we need to …nd a series of intermediate implications so that a proof often takes the form: A =) S1 =) S2 =) ¢ ¢ ¢ =) Sn =) B from which we conclude then that: A =) B: Thus the general strategy in proving A =) B is to begin with A and to use a series of correct implications to …nally obtain the statement B: Example 2: Suppose that: A = “a is odd and b is odd” B = “a + b is an even number” and we wish to prove that: A =) B; that is: Theorem 4 The sum of two odd numbers is even. Proof. Given A it follows that a and b are odd so that a = 2r + 1 and b = 2s + 1 for some integers r and s so that: A =) =) =) =) =) a = 2r + 1; b = 2s + 1 a + b = (2r + 1) + (2s + 1) a + b = 2 (r + s + 1) a + b = 2t where t = r + s + 1 ‘a + b is an even number’ = B. Note that there is a direction to the arrow: =) : This is to convey the idea that the truth of statement A is communicated to the truth of the statement B but it is not necessarily the case that the truth of B implies the truth of A: It is incorrect to conclude from A =) B that B =) A: Example 1: If B is true so that Mr. Smith lives in the province of Quebec we cannot conclude that A is true, that he lives in the city of Montreal. He may for example live in another city in Quebec, say Sherbrooke. Thus A =) B is true while B =) A is false. Example 2: If B is “the sum a + b is an even number”, we cannot conclude: A; that is “a and b are each odd numbers.” For example if a = 4 and b = 6 then B is true since 4 + 6 = 10 but A is not true since neither a nor b is odd. Thus A =) B is true while B =) A is false. CHAPTER 1. THE MATHEMATICAL METHOD 1.4 5 Negation Let » A denote the negation of A; that is “not” A or A is ‘not true’ or A is ‘false’. For example if A is the statement: “Mr. Smith lives in Montreal” then » A is the statement: “Mr. Smith does not live in Montreal”. The negation sign acts like a negative sign in arithmetic since: » (» A) = A If we have shown: A =) B we have seen that we cannot conclude from this that B =) A: However we can correctly conclude from A =) B that: ~B =) ~A or: A =) B then » B =)» A: Example 1: In the Montreal/Quebec example we can correctly conclude from A =) B that » B =)» A: If » B; Mr. Smith does not live in Quebec, then » A follows, he does not live in Montreal. Example 2: In the arithmetic example we can correctly conclude from » B; that “a + b is not an even number” then » A; that it is not the case that both a and b are odd ”. 1.5 Proof by Contradiction Reductio ad absurdum, which Euclid loved so much, is one of a mathematician’s …nest weapons. It is a far …ner gambit than any chess play: a chess player may o¤er the sacri…ce of a pawn or even a piece, but a mathematician o¤ers the game. -G. H. Hardy. Proof by contradiction or ‘Reductio ad absurdum’ involves proving a statement A by assuming the opposite » A and deriving a contradiction. Thus if: » A =) B and » A =)» B then » A must be false and hence A must be true. Example: He is unworthy of the name of man who is ignorant of the fact that the diagonal of a square is incommensurable with its side. -Plato Consider proving that: CHAPTER 1. THE MATHEMATICAL METHOD 6 p 2 is irrational; that is there are no integers a and b such that: p a 2= : b p Proof. Let us assume, to the contrary, that 2 is rational so that there exists an a and b such that: p a 2= b where a and b are integers. We can furthermore assume, without loss of generality, that a and b are not both even since if a = 2r and b = 2s then: p 2r r a = : 2= = b 2s s Theorem 5 For example if it were the case that a = 8 and b = 6 then since instead use a = 4 and b = 3 instead. Now we have: p a =) a2 = 2b2 2 = b =) a2 is an even number =) a is an even number 8 6 = 4 3 we could since if a were odd then a2 would also be odd (you might want to prove this). Therefore we can write a as: a = 2n where n is some integer. Using this in a2 = 2b2 we have: 2b2 = a2 = (2n)2 = 4n2 =) b2 = 2n2 =) b is an even number. Thus both a and b are even numbers which contradicts the requirementpthat a and b cannot both be even.p Therefore the original assumption that 2 is rational must be false so that 2 is irrational. QED. Remark: You will often see the letters ‘QED’ put at the end of a proof. These letters stand for the Latin phrase: “Quod erat demonstandum” which means: “That which was to be demonstrated”. This just means that the proof is …nished so you should not be looking for further arguments. We use the symbol: ¥ to indicate that a proof is …nished. 1.6 Necessary Conditions and Su¢cient Conditions In mathematics you will often hear of “necessary conditions” and “su¢cient conditions”. For example a necessary condition for Mr. Smith to live in Montreal is that he live in the province of Quebec, while a su¢cient condition for Mr. CHAPTER 1. THE MATHEMATICAL METHOD 7 Smith to live in the province of Quebec is that he live in Montreal. However, living in Montreal is not a necessary condition for living in Quebec and living in Quebec not a su¢cient condition for living in Montreal. We have: De…nition 6 If » B =)» A or equivalently A =) B; then B is a necessary condition for A: De…nition 7 Su¢cient Conditions: If A =) B or equivalently » B =)» A; then A is a su¢cient condition for B; Remark 1: Thus a necessary condition for A is one which necessarily must be satis…ed if A is to be true while a su¢cient condition for B is satis…ed then this guarantees the truth of B: Remark 2: Thus if we have proven a statement of the form A =) B then it follows that A is a su¢cient condition for B and B is a necessary condition for A: Remark 3: It is important to realize that if A is su¢cient for B it does not follow that A is necessary for B: Similarly if B is necessary for A it does not follow that B is su¢cient for A: Example: We proved that: “a is odd and b is odd ” =) “ a + b is even ”. Thus a su¢cient condition that the sum: a + b be an even number is that both a and b both be odd numbers while a necessary condition for both a and b to be odd is that a + b be even. However a + b being an even number is not a su¢cient condition for both a and b to odd numbers since a + b can be even without a and b both being odd; for example, if a = 2 and b = 4 then a + b = 6. Nor is a and b both being odd a necessary condition for a + b to be even; for example if a = 2 and b = 4 then a + b = 6. CHAPTER 1. THE MATHEMATICAL METHOD 1.7 8 Necessary and Su¢cient Conditions Sometimes it is possible to prove both A =) B and B =) A: In this case A is a necessary and su¢cient condition for B and B is a necessary and su¢cient condition for A since it then is true that: A =) B and » A =)» B B =) A and » B =)» A: We therefore have: De…nition 8 If A =) B and B =) A then A is a necessary and su¢cient condition for B and we write A , B: Notice that with A , B the arrow points in both directions. This is to indicate that the truth of A communicated to B just as the truth of B is communicated to A: If you can prove A , B then you have a much stronger statement than either A =) B or B =) A alone. Example: Consider A = “The integers a and b are odd numbers” B = “The product of the integers: a £ b is an odd number” Theorem 9 We have: a and b are odd numbers if and only if a £ b is an odd number so that: A , B: Proof. Suppose a and b are odd so that: a = 2m + 1 and b = 2n + 1: It follows that: a £ b = (2m + 1) £ (2n + 1) = 4mn + 2m + 2n + 1 = 2 (2mn + m + n) + 1 so that a £ b is odd. Now suppose that B is true so that: a £ b = 2m + 1 and consider a proof by contradiction to show that A is true. Thus suppose A is false, so that one of a and b is even. Without loss of generality suppose a is even so that a = 2n and hence: 2n £ b = 2m + 1 1 =) n £ b = m + : 2 Now n £ b is an integer but m + 12 is not an integer, which is a contradiction. It follows then that A is true. CHAPTER 1. THE MATHEMATICAL METHOD 1.8 9 ‘Or’ and ‘And’ “You are sad,” the Knight said in an anxious tone: “let me sing you a song to comfort you.” “Is it very long?” Alice asked, for she had heard a good deal of poetry that day. “It’s long,” said the Knight, “but it’s very, very beautiful. Everybody that hears me sing it – either it brings the tears into their eyes, or else ” “Or else what?” said Alice, for the Knight had made a sudden pause. “Or else it doesn’t, you know.” -Lewis Carroll, Through the Looking Glass In life and in mathematics we often connect di¤erent statements using the word “or” which is denoted by _. Example: If A is the statement “n is odd” and B is the statement “n > 10” then: A _ B means that either n is odd or n is greater than 10: If n = 13 then A _ B would be true since n satis…es both A and B; it is odd and greater than 10: If n = 7 then A _ B would also be true since n satis…es A and we do not need to satisfy B: Similarly if n = 22 then A _ B would be true because n satis…es B and we then do not need to satisfy B: The only way A _ B can be false is if both A and B are false. Thus if n = 8 then A _ B would be false. Remark: The use of “or” here is the “inclusive or” which is di¤erent from the “exclusive or” that your mother used when she said: “You can either have cake or pie”, which meant you can have cake, you can have pie, but you cannot have both cake and pie. If your mother used the inclusive or then you could also have cake and pie. Here are some results involving the connector _ : Theorem 10 The Law of the Excluded Middle: For any statement A: the statement: A_ » A is true. Theorem 11 For any statements A and B : (A =) B) , (» A _ B) : Another important connector of statements is “and” which is denoted by: ^: Thus in the previous example A ^ B means n is odd and n is greater than 10: For the statement A ^ B to be true it must be then that both A and B are true. CHAPTER 1. THE MATHEMATICAL METHOD 10 Negating statements involving _ is equivalent to negating each individual statement and changing the _ to ^: Similarly negating statements involving ^ is equivalent to negating each individual statement and changing the _ to ^: Thus: Theorem 12 For any statements A and B » (A _ B) ,» A^ » B » (A ^ B) ,» A_ » B: 1.9 The Quanti…ers 9 and 8 Sometimes in mathematics we use the quanti…er 9 to say that something exists: For example to express the idea that the integer a is odd we might write: 9nj (a = 2n + 1) which says that there exists an integer n such that a = 2n + 1: Other times we wish to make a universal statement that all members of some class have a property using the quanti…er: 8: For example we might write: 8nj (n > n ¡ 1) which says that all integers n are greater than n ¡ 1: In intermediate mathematics and economics the symbols 9 and 8 are sometimes used as a convenient short-hand but are not that important. They do get used a lot in advanced mathematics and economics. 1.10 Proof by Counter-Example In mathematics we are often led by our intuition to believe without a proof that something is always true. We therefore form a guess or a conjecture that this statement is always true. Example: In the seventeenth century the French mathematician Fermat conjectured that if n is an integer then all numbers of the form: n 22 + 1 are prime numbers. Thus with n = 0; 1; 2; 3; 4 we have: 0 1 2 3 4 22 + 1 = 3; 22 + 1 = 5; 22 + 1 = 17; 22 + 1 = 257; 22 + 1 = 65537 and it is a fact that 3; 5; 17; 257 and 65537 are all prime numbers. We have: (An integer n is a prime number if its only divisors are 1 and n: Thus 5 is prime because only 1 and 5 divides evenly into 5 while 9 is not prime since 9 3 = 3.) Since we do not know if a conjecture is true or false, there are two strategies for dealing with a conjecture: CHAPTER 1. THE MATHEMATICAL METHOD 11 1. Prove the conjecture is true. 2. Use a proof by counter-example to …nd one case where the conjecture is false. The …rst strategy is generally the most di¢cult since we have to prove that something holds for an in…nite number of cases. For Fermat’s conjecture it would very likely be a deep and di¢cult proof that would show that all numbers of n the form 22 + 1 are prime. Of course if the conjecture really is true this is the only strategy that will lead to success. Often however you will …nd that no matter how hard you try you cannot prove a conjecture. In this case you might try the second strategy and search for a counter-example. If you are lucky this can be much easier since unlike the …rst strategy, you only need one counter-example to prove the conjecture false. Fermat died without being able to prove his conjecture. Later Euler was able to show that Fermat’s conjecture is in fact false since for n = 5: 5 22 + 1 = 4294967297 = 641 £ 6700417 5 and so 22 + 1 is not prime. 1.11 Proof by Induction Just the place for a Snark! I have said it twice: That alone should encourage the crew. Just the place for a Snark! I have said it thrice: What I tell you three times is true. -Lewis Caroll, The Hunting of the Snark Often students attempt to prove results by simply listing the …rst few cases, verifying that the statement is true, and then by putting a ‘: : : ’ or an ‘etc.’ afterwards. For example suppose you wished to prove the following conjecture: Conjecture: The sum of the …rst n integers is: 1 + 2 + 3 +¢¢¢+n = n (n + 1) : 2 Thus one might write: 1 (1 + 1) =1 2 2 (2 + 1) =3 1+2 = 2 3 (3 + 1) =6 1+2+3 = 2 etc: 1 = CHAPTER 1. THE MATHEMATICAL METHOD 12 and concludes that the statement is true. As we should now know this is incorrect since it does not exclude the possibility that the conjecture is false for 10 n = 4 or at some really huge number like: n = 1010 : A correct method to proof these kind of conjectures is proof by induction. A proof by induction proceeds as follows. We are given a sequence of statements S1 ; S2 ; : : : and we want to prove that each Si is true. For example we wish to prove that Sn is true where Sn is the statement: Sn = \1 + 2 + 3 + ¢ ¢ ¢ + n = n (n + 1) ": 2 Proof by induction proceeds in two steps: Proof by Induction 1. Prove that S1 is true. This is usually trivial involving nothing more than a mere calculation. 2. Assume that S1 ; S2 : : : Sn¡1 are true (this is called the induction hypothesis), and use this to prove that Sn is true. A proof by induction is very much like setting up an in…nite row of dominos. To get every domino to fall over two things are needed. First, one must tip over the …rst domino to get the chain reaction started. This corresponds to the …rst step in the proof by induction. Next, the dominos must be spaced so that if one domino falls then its next neighbour must also fall. This corresponds to the second part of the proof. Together they imply that all of the dominos will fall down. Example: Consider proving: 1 + 2 + 3 +¢¢¢+n = n (n + 1) : 2 Proof. The …rst step is to verify that it is true for n = 1; which is easy since: 1= 1 (1 + 1) : 2 Now assume the induction hypothesis, that the statement is true up to n ¡ 1 so that in particular: 1 + 2 + 3 +¢¢¢+n ¡1 = (n ¡ 1) ((n ¡ 1) + 1) n (n ¡ 1) = : 2 2 CHAPTER 1. THE MATHEMATICAL METHOD 13 Now we need to prove the statement is true for Sn : We have: |1 + 2 + 3 +{z¢ ¢ ¢ + n ¡ 1} + n = = n(n¡1) 2 1.12 n (n ¡ 1) +n 2 µ ¶ (n ¡ 1) = n +1 2 µ ¶ (n ¡ 1) + 2 = n 2 n (n + 1) : = 2 Functions The basic mathematical object that we will be working with is a function, de…ned as: De…nition 13 Function: A function y = f (x) is a rule which assigns a unique number y to every allowed value of x : The key requirement here is that there be a unique y for every x: For example the function: y = f (x) = x2 assigns to the value x = 2 the unique value y = 22 = 4: An example of something which is not a function is: p y = f (x) = x since to x = 4 it assigns p two values: y = 2 and y = ¡2 while to x = ¡4 it assigns no value since ¡4 is not de…ned. Similarly without any restrictions on x: f (x) = x1 is not a function since f (0) = 10 is not de…ned. Implicit in any de…nition of a function is its domain and range: De…nition 14 Domain: The domain of a function: y = f (x) is the set of all values of x for which f (x) is de…ned. De…nition 15 Range: The range of a function y = f (x) is the set of possible y values over the domain of the function. Often we can insure that f (x) is a function by restricting its domain and range. Example: The problem with f (x) = p x CHAPTER 1. THE MATHEMATICAL METHOD 14 can be …xed by: 1) restricting the domain to be x ¸ 0 and 2) restricting the p as the positive square root range to be y ¸ 0 or in other words interpreting p (e.g. 4 = 2 and not ¡2 ). With these restrictions we have a perfectly good function as can be seen by the plot below: 6 5 4 3 2 1 0 5 10 15 x 20 25 30 35 : p y = x for x ¸ 0 This is actually an example of a Cobb-Douglas production function, one of the workhorses of economic theory. Similarly the problem with f (x) = x1 can be …xed by restricting the domain to be x > 0 in which case we have: 10 8 6 4 2 1 2 x f (x) = 3 4 5 : 1 x Remark: Quite often we de…ne the range and domain in a way that ensures that the function makes economic sense. If for example Q = f (P ) is a demand function with P the price and Q the quantity demanded, the domain of f (P ) will be P ¸ 0 and the range Q ¸ 0 since prices and quantities cannot be negative. CHAPTER 1. THE MATHEMATICAL METHOD 1.12.1 15 Integer Exponents An important class of functions take the form: f (x) = xn where n is an integer. The meaning of xn for n > 0 is simply x multiplied by itself n times. For example x3 = x £ x £ x: In this case we can allow the domain of f (x) to be all x; that is ¡1 · x · 1: We can also allow negative integer exponents (i.e., ¡1; ¡2; ¡3 : : : ): By x¡n we mean x1n : For example: x¡3 = 1 1 1 1 = £ £ : x3 x x x Note that for negative integer exponents we need to exclude x = 0 from the domain of the function since, 10 = 1 is not de…ned. Integer exponents obey the following rules, which you might want to prove on your own: Theorem 16 If m and n are either positive or negative integers then 1. xm xn = xm+n n 2. (xm ) = xmn 3. x0 = 1 4. x¡n = 1 xn n 5. (xy) = xn yn : Proof. To prove 1 for example we have: 10 0 xm xn 1 A @x £ x £ ¢ ¢ ¢ £ xA = @|x £ x £ {z¢ ¢ ¢ £ x} {z } | m n = |x £ x £ ¢ ¢ ¢ £ x {z } m+n = xm+n : The results 2 and 5 can be proven in a similar manner. To prove x0 = 1 from 1) and 4) we have: from 1) with n = ¡m: xm x¡m = xm¡m = x0 CHAPTER 1. THE MATHEMATICAL METHOD while from 4) we have: x¡m = 1 xm 16 and so: x0 = xm x¡m = xm 1 = 1: xm Remark: Note that (x + y)n 6= xn + y n . For example with n = 2 2 (x + y) = x2 + 2xy + y2 6= x2 + y 2 : 1.12.2 Polynomials An polynomial a weighted sum of xn de…ned as: De…nition 17 Polynomial: An nth degree polynomial is a function of the form: f (x) = an xn + an¡1 xn¡1 + ¢ ¢ ¢ + a1 x + ao where an 6= 0: An important property of a polynomial are its roots: De…nition 18 The roots of a function f (x) are those values r which satisfy f (r) = 0: For a polynomial a root r satis…es: f (r) = an rn + an¡1 rn¡1 + ¢ ¢ ¢ + a1 r + ao = 0: One of the most important results in mathematics is that a polynomial of degree n has n (possibly complex) roots. This is important enough that it is called the Fundamental Theorem of Algebra. It was …rst proved by Gauss. Theorem 19 Fundamental Theorem of Algebra: An nth degree polynomial has n roots: r1 ; r2 ; : : : rn ; that is n (possibly complex2 ) solutions to the equation f (r) = an rn + an¡1 rn¡1 + ¢ ¢ ¢ + a1 r + ao = 0: Two important special cases are: De…nition 20 A linear function is a 1st degree polynomial: y = f (x) = ax + b De…nition 21 A quadratic is a 2nd degree polynomial: y = f (x) = ax2 + bx + c: 2A complex number is of the form a + bi where i = p ¡1: CHAPTER 1. THE MATHEMATICAL METHOD 17 Example 1: A 1st degree polynomial: f (x) = ax + b has one root: r = ¡b=a as the solution to f (r) = ar + b = 0: Thus f (x) = 4x + 8 has a single root at r = ¡2 as illustrated below: 20 10 -4 0 -2 2 x 4 -10 : f (x) = 4x + 8 Example 2: We have Theorem 22 The quadratic f (x) = ax2 + bx + c: has two roots r1 and r2 given by: p p ¡b + b2 ¡ 4ac ¡b ¡ b2 ¡ 4ac : r1 = and r2 = 2a 2a Thus the quadratic: x2 ¡ 9x + 14 has two roots: r= ¡ (¡9) § q (¡9)2 ¡ 4 (1) (14) 2 or r1 = 2 and r2 = 7 CHAPTER 1. THE MATHEMATICAL METHOD 18 as can also be seen by the graph below where f (x) crosses the x axis: 80 60 40 20 -4 0 -2 2 4 6 x 8 10 12 : 2 f (x) = x ¡ 9x + 14 An implication of the fundamental theorem of algebra is that a polynomial can always be factored as follows: Theorem 23 Let r1 ; r2 ; : : : rn be the n roots of the polynomial f (x) = an xn + an¡1 xn¡1 + ¢ ¢ ¢ + a1 x + ao : Then f (x) can be factored as: f (x) = an (x ¡ r1 ) £ (x ¡ r2 ) £ ¢ ¢ ¢ £ (x ¡ rn ) : Example 1: The quadratic: f (x) = 3x2 ¡ 27x + 60: has two roots r1 = 5 and r2 = 4 so that f (x) = 3 (x ¡ 5) (x ¡ 4) which you can verify by multiplying out the second expression. Example 2: The cubic: x3 ¡ 19x2 + 104x ¡ 140 CHAPTER 1. THE MATHEMATICAL METHOD 19 has roots at r1 = 2; r2 = 7 and r3 = 10 as can be seen by the graph below: 200 150 100 50 0 2 4 6 x 8 10 12 -50 -100 f (x) = x3 ¡ 19x2 + 104x ¡ 140 or by noting that it can be factored as: x3 ¡ 19x2 + 104x ¡ 140 = (x ¡ 2) (x ¡ 7) (x ¡ 10) : 1.12.3 Non-integer Exponents In economics we will often want to consider non-integer exponents that is: f (x) = xa where a is not an integer. Example: Two functions with non-integer exponents are f (x) = x0:3143 and 1 Q = f (L) = L 2 where in the …rst case a = 0:3143 and in the second a = 0:5: The latter case is an example of a Cobb-Douglas production function. For non-integer exponent functions y = xa we run into very di¢cult and deep mathematical waters if we allow either x or y to p be negative. For example 1 with f (x) = x 2 if we allow x = ¡1pthen f (¡1) = p¡1 is not de…ned while if x = 4 and we allow y < 0 then y = 4 = 2 and y = 4 = ¡2: For this reason, whenever we work with y = xa with an exponent a which is not integers we always assume that x > 0 and that y > 0: With this quali…cation non-integer exponents obey the same rules as with integer exponents. Thus: Theorem 24 If x > 0 and a is any number (integer or non-integer, negative or positive) then xa is de…ned and: 1. xa > 0 CHAPTER 1. THE MATHEMATICAL METHOD 20 2. xa xb = xa+b b 3. (xa ) = xab a 4. (xy) = xa y a 5. x0 = 1 6. x¡a = 1 xa : Often we will need to …nd the unique positive root of the function: f (x) = Axb ¡ c for x > 0 and where b is not an integer. We have: Theorem 25 The unique positive root f (r) = Arb ¡ c = 0 is given by: r= Proof. Since r satis…es: ³ c ´ 1b A : Arb = c we have: rb = c : A To get r by itself we take both sides to the power ¡ b ¢ 1b ³ c ´ 1b r = A ¡ ¢1 1 and since rb b = rb b = r1 = r we have: r= ³ c ´ 1b A : Example: Given: f (x) = 10x7:3 ¡ 23 where A = 10; b = 7:3 and c = 23 we …nd that: r= µ 23 10 1 ¶ 7:3 = 1:120: 1 b to get: CHAPTER 1. THE MATHEMATICAL METHOD 21 This can also be seen in the plot below: 150 100 y 50 0 0.2 0.4 0.6 0.8 x 1 1.2 1.4 : 7:3 f (x) = 10x 1.12.4 ¡ 23 The Geometric Series An important result in economics is the geometric series: Theorem 26 Finite Geometric Series: If x 6= 1 then 1 + x + x2 + x3 + ¢ ¢ ¢ xn¡1 = 1 ¡ xn : 1¡x Proof. Let S be given by:: S = 1 + x + x2 + x3 + ¢ ¢ ¢ + xn¡1 : If we multiply S by x we obtain: xS = x + x2 + x3 + ¢ ¢ ¢ + xn and if we subtract these two equations from each other we obtain: ¢ ¡ S ¡ xS = (1 ¡ x) S = 1 + x + x2 + x3 + ¢ ¢ ¢ xn¡1 ¡ x + x2 + x3 + ¢ ¢ ¢ + xn = 1 ¡ xn so that assuming x 6= 1 and solving for S yields the …nite geometric series: Now consider letting n ! 1: If ¡1 < x < 1 then xn ! 0 so that: Theorem 27 In…nite Geometric Series: If ¡1 < x < 1 then 1 = 1 + x + x2 + x3 + ¢ ¢ ¢ : 1¡x CHAPTER 1. THE MATHEMATICAL METHOD Example 1: If x = 1 2 22 then: 1 2= 1¡ 1 2 1 =1+ + 2 µ ¶2 µ ¶3 1 1 + +¢¢¢ : 2 2 Thus if you have 2 pies in the fridge and each day you eat 12 of the pie in the fridge, eating 1 pie the …rst day, 12 of a pie the second, 14 of a pie the third etc., you will eventually eat all the pie in the fridge. Example 2: Suppose you have a bond that pays $a a year forever and the interest rate is r > 0: The price of the bond is then the present discounted value: PB = a a a + + + ¢¢¢ : 1 + r (1 + r)2 (1 + r)3 We have: PB = = 0 1 x x2 z }| { z }| { C a B 1 1 B C + + ¢ ¢ ¢ B1 + C 1+r @ 1 + r (1 + r)2 A ¢ a ¡ 1 + x + x2 + ¢ ¢ ¢ 1+r so the term in brackets is just the geometric series with x = Then from the geometric series: PB = = = = 1 1+r with 0 < x < 1. a 1 1+r1¡x a 1 1 1 + r 1 ¡ 1+r a 1+r 1+r r a : r Thus with an interest rate of r = 0:05=year (or 5% per year), a bond that payed a = $20 per year forever would be worth PB = $20=year = $400: 0:05=year Chapter 2 Univariate Calculus 2.1 2.1.1 Derivatives Slopes You can think of a function f (x) as a system of mountains and valleys with x denoting your position along the x axis (say how far east or west from some point) and y your height above the x axis (or how high you are above sea level). This is illustrated below: y x Mountains and Valleys An important consideration for both a hiker and an economist then is the slope. Hikers clearly care if they are going uphill or downhill, and economists care if a function is upward sloping (as a supply curve) or downward sloping (as a demand curve). The slope at any point x can be measured by moving ¢x (say ¢x = 5 or 5 to the right), measuring the change in elevation by ¢y; (say ¢y = ¡20 or 20 ¢y ¢y = ¡20 to get the slope (here ¢x feet down) and taking the ratio ¢x 5 = ¡4 so for every foot forward you fall 4 feet with the negative indicating a downward slope). This leads to the following de…nition: 23 CHAPTER 2. UNIVARIATE CALCULUS 24 De…nition 28 Slope: The slope of f (x) at x for a given change in x : ¢x is ¢y and is: denoted by ¢x f (x + ¢x) ¡ f (x) ¢y ´ : ¢x ¢x ¢y > 0 the function is upward sloping so that increasing (decreasing) x If ¢x ¢y < 0 the function is downward sloping leads to an increases (decrease) in y: If ¢x so that increasing (decreasing) x leads to a decrease (increase) in y: Example: If f (x) = x2 and we want to measure the slope at x = 1 with ¢x = 2 then we obtain: ¢y ¢x (x + ¢x)2 ¡ x2 ¢x (1 + 2)2 ¡ 12 = 2 = 4: = On the other hand if we use ¢x = 0:25 we obtain: ¢y ¢x (x + ¢x)2 ¡ x2 ¢x (1 + 0:25)2 ¡ 12 = 0:25 = 2: 25 = while if we use ¢x = 0:001 we obtain: ¢y ¢x (1 + 0:001)2 ¡ 12 0:001 = 2: 001 = Note that as we make ¢x smaller the slope appears to be approaching 2: In general we have: Theorem 29 For f (x) = x2 the slope is: ¢y = 2x + ¢x: ¢x 2.1.2 Derivatives And what are these derivatives? ... They are neither …nite quantities, nor quantities in…nitely small, nor yet nothing. May we not call them ghosts of departed quantities?-George Berkeley CHAPTER 2. UNIVARIATE CALCULUS 25 A problem with slopes is that we may get a di¤erent slope depending on which ¢x we choose. For example with x2 at x = 1 the slope is 2 + ¢x and so we obtained slopes of 2:25 and 2:001 for ¢x = 0:25 and ¢x = 0:001: Since we get di¤erent slopes for di¤erent ¢x0 s; one might wonder then whether there is a best choice for ¢x? This question has a surprising answer. It turns out that the best choice is to make ¢x zero so that we let: ¢x ! 0: Remark: Sometimes rather than 0 it is better to follow the earlier inventors of calculus and imagine that: ¢x ! dx where dx (note the ¢, delta, is the Greek letter for d ), known as an in…nitesimal, is a quantity that is as close to zero as possible without actually being 0. As we make our step ¢x in…nitesimally small, the amount by which we rise or fall will also get in…nitesimally small so that: ¢y ! dy: The ratio however of the two or the slope will approach something sensible, the derivative of the function: dy = f 0 (x) : dx The use of in…nitesimals was frowned upon by many such as the English philosopher Berkeley. It was not until over a hundred years after the invention of calculus that Cauchy was able to provide foundations for calculus that did not require the use of in…nitesimals. In…nitesimals are nevertheless a real aid to intuition and especially in applied work they get used all the time. Example: If y = x2 we have from Theorem 29 that as ¢x ! 0: ¢y = 2x + ¢x ! 2x + dx = 2x ¢x where we ignore the dx because it is so small. This gives the well-known result obtained by multiplying by the exponent and subtracting one from the exponent that: d ¡ 2¢ dy = x = 2x: dx dx Thus at x = 1 we obtain a slope of dy = f 0 (1) = 2: dx In general a derivative is de…ned as: CHAPTER 2. UNIVARIATE CALCULUS 26 De…nition 30 Derivative: The derivative of a function y = f (x) ; denoted dy ; is the limit of the slope as ¢x ! 0 or: by f 0 (x) or dx lim ¢x!0 ¢y f (x + ¢x) ¡ f (x) = lim : ¢x!0 ¢x ¢x A graphical depiction of the di¤erence between a slope and a derivative is given below: The …rst rule you learn in calculus is how to calculate the derivative of xn as: CHAPTER 2. UNIVARIATE CALCULUS 27 Theorem 31 Given f (x) = xn then: f 0 (x) = nxn¡1 : Example: Given f (x) = x7 then: f 0 (x) = 7x7¡1 = 7x6 : 2.1.3 The Use of the Word ‘Marginal’ in Economics In economics we often use the word marginal; for example the marginal product of labour, the marginal utility of apples, the marginal propensity to consume and so on. The original meaning of ‘marginal’ in say the marginal product of labour was the e¤ect of adding one more unit of labour L, at the margin, to output Q. Translating this into mathematics, if we write the production function as Q = f (L) then the marginal product of labour is: f (L + 1) ¡ f (L) ¢Q = ¢L 1 where ¢L = 1 and ¢Q = f (L + 1) ¡ f (L) : Thus the marginal product of labour is the slope: ¢Q ¢L when ¢L = 1. In advanced economics we want to have the tools of calculus at our disposal. For this reason it is much more convenient to use derivatives rather than slopes to measure the marginal product of labour. Consequently instead of setting dQ 0 ¢L = 1 and using ¢Q ¢L we let ¢L ! 0 and use the derivative dL = f (L). This re…nement of then notion of marginal extends now to all marginal concepts so that today: MPL ´ De…nition 32 In economics when we refer to marginal concepts we mean the derivative. Example 1: Given the Cobb-Douglas production function: 1 Q = f (L) = L 2 the marginal product of labour is the derivative of f (L) or: 1 1 M PL (L) ´ f 0 (L) = L¡ 2 : 2 Example 2: Given a utility function for say apples Q: 2 U (Q) = Q 3 the marginal utility of apples is: 2 1 MU (Q) ´ U 0 (Q) = Q¡ 3 : 3 CHAPTER 2. UNIVARIATE CALCULUS 2.1.4 28 Elasticities Often economists work with elasticities rather than derivatives. The problem with derivatives is that they depend on the units in which x and y are measured. Suppose for example we have a demand curve: Q = 100 ¡ 3P $ where the price P $ is measured in dollars and the derivative is dQ dP = ¡3: If we c P/ and: decide to measure the price instead in cents P c/ we have, using: P $ = 100 Q = 100 ¡ 3 / Pc 100 dQ 3 so now dQ dP = ¡ 100 : Thus a change in units thus causes the derivative dP to 3 change from ¡3 to ¡ 100 : Elasticities avoid this problem by working with percentage changes. While ¢y ; the elasticity ´ the slope is the change in y divided by the change in x or ¢x ¢y is the percentage change in y: y £ 100% divided by the percentage change in x : ¢x x £ 100% or: ´´ ¢y y ¢x x £ 100% £ 100% = ¢y x : ¢x y ¢y Notice that here the elasticity ´ is the slope ¢x multiplied by xy : This is known as an arc elasticity and is typically used in elementary economics. In more advanced economics we let ¢x ! 0 and use the derivative in the dy ¢y rather than the slope: ¢x . This leads to the point elasticity or elasticity: dx simply the elasticity: De…nition 33 Elasticity: The elasticity of the function y = f (x) at x denoted by ´ (x) is: ´ (x) ´ x dy x ´ f 0 (x) : dx y f (x) An easy way to remember the formula for the elasticity is to follow the following recipe: Elasticity Recipe 1. Write down the derivative as: dy : dx CHAPTER 2. UNIVARIATE CALCULUS 29 2. Note that with the derivative y is upstairs and x is downstairs. To obtain the elasticity put the y downstairs and the x upstairs as: x y and multiply this with 1 as: ´= dy x £ : dx y 3. Now to obtain the elasticity as a function of x replace y with f (x) to obtain: x : f 0 (x) f (x) Remark 1: In economics typically x > 0 and y > 0: This means that the derivative f 0 (x) and the elasticity ´ (x) always have the same sign. Thus if the elasticity of demand is negative this is equivalent to saying the demand curve slopes downwards. Remark 2: If ´ (x) = ¡2 then 1% increase in x leads to a 2% decrease in y: If ´ (x) = 3 then a 1% increase in x leads to a 3% increase in y: Remark 3: An elasticity can be calculated for any function y = f (x) ; not just for demand curves. Example 1: If y = 4 ¡ 2x (a demand curve perhaps if y = Q and x = P ) then following the recipe: 1. We …rst write down the slope as: dy = ¡2: dx 2. Since there is a y upstairs and an x downstairs we multiply this by x ´ = ¡2 : y 3. To obtain the elasticity as a function of x replace y with 4 ¡ 2x as: ´ (x) = ¡2 x x = ¡2 : y 4 ¡ 2x x y as: CHAPTER 2. UNIVARIATE CALCULUS 30 Thus at x = 12 ; we have: µ ¶ µ ¶ 1 ¡2x ´ = j 1 2 4 ¡ 2x x= 2 1 = ¡ 3 so that at x = 12 a 1% increase in x leads to a 0:33% decrease in y: Notice that while the derivative is ¡2 for all x; the elasticity decreases as x increases as shown in the plot below: 0 0.2 0.4 0.6 0.8 x 1 1.2 1.4 1.6 1.8 -2 -4 -6 -8 : ´ (x) = ¡2x 4¡2x Example 2: If y = x2 + 5 then following the recipe: 1. We …rst write down the slope as: dy = 2x: dx 2. Since there is a y upstairs and an x downstairs we multiply this by ´ = 2x £ x : y 3. To obtain the elasticity as a function of x replace y with x2 + 5 as: ´ (x) = 2x £ 2x2 x = : x2 + 5 x2 + 5 x y as: CHAPTER 2. UNIVARIATE CALCULUS 2.1.5 31 The Constant Elasticity Functional Form Generally speaking the elasticity ´ (x) changes with x: Now with slopes we know there exists a function f (x) = ax + b where the derivative f 0 (x) = a does not change with x although the elasticity: ´ (x) = f 0 (x) ax x = y ax + b does change with x: Given the importance of elasticities in economics, a natural question to ask whether there is a functional form which has the property that the elasticity ´ (x) does not change with x? This is often convenient since it means, for example, that a demand curve has the same elasticity no matter what the price. The functional form which has this property is f (x) = Axb : In fact we have an even stronger result: Theorem 34 A function f (x) has the same elasticity for all x if and only if it can be written as: f (x) = Axb : Proof. To prove that f (x) = Axb has a constant elasticity note that: f 0 (x) = bAxb¡1 and consequently: ´ (x) = f 0 (x) x x = bAxb¡1 b = b: f (x) Ax To prove that ´ (x) = b =) f (x) = Axb requires either integral calculus or di¤erential equations and so we omit it here. Example 1: The demand curve Q = 1000P ¡3 has the functional form Axb and hence the elasticity of demand is ´ (P ) = ¡3; the exponent on P: Example 2: If we add a constant 10 to the demand curve in Example 1 as: Q = 1000P ¡3 + 10 CHAPTER 2. UNIVARIATE CALCULUS 32 then the demand curve no longer has the functional form Axb and so does not have the same elasticity for all P: In fact: ´ (P ) = ¡3 ¡3000P ¡3 = ¡3 P3 1000P + 10 1 + 100 and so changing P changes the elasticity as illustrated in the plot below: -1.4 -1.6 -1.8 -2 -2.2 -2.4 -2.6 -2.8 -3 0 1 2 P ´ (P ) = 2.1.6 3 4 5 : ¡3 P3 1+ 100 Local and Global Properties When a hiker says: “just after the stream the trail climbs” he is talking about a local property of the trail since it may be that later on the trail descends. On the other hand if he says: “it rained today and the entire trail is muddy” he is making a global statement, one that applies to the entire trail. When we refer to functions we also will want to distinguish between local and global properties. We have: De…nition 35 Local Properties: We say f (x) has some property locally at xo if there is a neighborhood around xo (perhaps very small) where f (x) has that property. De…nition 36 Global Properties: We say f (x) has some property globally if the function has that property for all x in the domain of f (x). Making a global statement is always stronger than making a local statement. If the trail is globally muddy, then it follows that it is locally muddy just after the stream. However if it is locally muddy just after the stream, it does not follow that it is globally muddy. This clearly also holds for functions and so: Theorem 37 If A is the statement \f (x) has property P globally" and B is the statement \f (x) has property P locally" then A =) B but it is not true that B =) A: CHAPTER 2. UNIVARIATE CALCULUS 33 A function can be either locally or globally increasing or decreasing according to the following de…nitions: De…nition 38 Locally Increasing: If f 0 (xo ) > 0 the function is locally increasing (upward sloping) at x = xo . De…nition 39 Globally Increasing (or Monotonic): If f 0 (x) > 0 for all x in the domain of f (x) then the function is globally increasing or monotonic. De…nition 40 Locally Decreasing: If f 0 (xo ) < 0 the function is locally decreasing (or downward sloping) at x = xo : De…nition 41 Globally Decreasing: If f 0 (x) < 0 for all x in the domain of f (x) then the function is globally decreasing. Example 1: Demand curves are globally downward sloping while supply curves are globally upward sloping or monotonic. Example 2: Consider the function graphed below: 1 0.8 0.6 0.4 0.2 0 2 4 6 x 8 10 12 : This function is locally increasing for at say x0 = 4 and in general for any x < 6: It is locally decreasing at x0 = 8; and in general for any x > 6: Since it increases for some x and decreases for others, it is neither globally increasing nor globally decreasing. CHAPTER 2. UNIVARIATE CALCULUS 34 Example 3: Consider the function graphed below: 50 40 30 20 10 0 1 2 x 3 4 : You can verify from the graph that this function is locally increasing at x0 = 1 and at x0 = 3: In fact it is increasing for all x and so this function is globally increasing or monotonic. 2.1.7 The Sum, Product and Quotient Rules There are a small number of rules to remember when calculating derivatives. Three of the more important rules are the sum, product rule and quotient rules given below: Theorem 42 Sum Rule: If h (x) = af (x)+bg (x) where a and b are constants then: h0 (x) = af 0 (x) + bg 0 (x) : Theorem 43 Product Rule: If h (x) = f (x) g (x) then h0 (x) = f 0 (x) g (x) + f (x) g 0 (x) : Theorem 44 Quotient Rule: If h (x) = h0 (x) = f (x) g(x) then g (x) f 0 (x) ¡ f (x) g 0 (x) g (x)2 : Example 1: Given f (x) = 3x5 + 4x3 we have from the sum rule that: d ¡ 3¢ d ¡ 5¢ x +4 x dx dx 4 2 = 15x + 12x : f 0 (x) = 3 CHAPTER 2. UNIVARIATE CALCULUS 35 Example 2: Given h (x) = x2 f (x) : then from the product rule we have: h0 (x) = 2xf (x) + x2 f 0 (x) : Example 3: Suppose that P (Q) is the inverse demand curve that a monopolist faces with P 0 (Q) < 0 so that the inverse demand curve slopes downwards. Total revenue (or sales) as a function of Q is then equal to R (Q) = P (Q) £ Q: Marginal revenue then is de…ned by: MR (Q) ´ R0 (Q) : Using the product rule we obtain: ¡ z }| { M R (Q) = P 0 (Q) Q + P (Q) since P 0 (Q) < 0 and Q > 0: It follows that: MR (Q) < P (Q) so that the marginal revenue curve is always less than price. This divergence between price and marginal revenue is the reason why a monopolist produces at a lower level of output than is socially optimal (more precisely Pareto optimal). For a perfectly competitive …rm on the other hand P does not depend on Q and hence is a constant. Since the derivative of a constant is 0 it follows that P 0 (Q) = 0 and hence: MR (Q) = P: Example 4: Suppose we want to calculate the derivative of h (x) = f (x) : x2 Then from the quotient rule we have h0 (x) = x2 f 0 (x) ¡ 2xf (x) (x2 )2 : CHAPTER 2. UNIVARIATE CALCULUS 36 Example 5: If C (Q) is the …rm’s cost function, then marginal cost is given by: MC (Q) ´ C 0 (Q) while average cost is AC (Q) ´ C (Q) : Q Di¤erentiating AC (Q) and using the quotient rule we …nd that: AC 0 (Q) = = = QC 0 (Q) ¡ C (Q) Q2 µ ¶ 1 C (Q) C 0 (Q) ¡ Q Q M C (Q) ¡ AC (Q) : Q From this we see that AC 0 (Q) > 0 () M C (Q) > AC (Q) AC 0 (Q) < 0 () M C (Q) < AC (Q) so that the or AC 0 (Q) > 0 when marginal cost exceeds average cost, and average cost curve is decreasing or AC 0 (Q) < 0 when marginal cost is less than average cost. Example 6: If C (Q) = 10Q2 + 20 then: MC (Q) = 20Q; AC (Q) = 20 10Q2 + 20 = 10Q + Q Q so that: p ´³ p ´ ¢ 10 ³ 20 10 ¡ 2 Q + Q ¡ 2 = = 2 Q ¡ 2 Q2 Q2 Q2 p so that AC p (Q) is falling for Q < 2 and hence AC > M C, AC (Q) is increasing for Q > 2pand hence AC < M C; and that marginal and average cost are equal when Q = 2; the minimum point of the average cost curve. You can see these AC 0 (Q) = 10 ¡ CHAPTER 2. UNIVARIATE CALCULUS 37 relationships below where the straight line is MC (Q): 100 80 60 40 20 1 2 Q 3 4 5 : MC (Q) and AC (Q) 2.1.8 The Chain Rule We will often be working with a function of a function. For example consider the function: h (x) = 1 : 1 + x2 We can think of h (x) as consisting of two functions: an outside function: f (x) = x1 and an inside function: g (x) = 1 + x2 ; that is: h (x) = f (g (x)) 1 = g (x) 1 : = 1 + x2 In general we have: De…nition 45 Given: h (x) = f (g (x)) we call f (x) the outside function and g (x) the inside function. At the moment we have no rule for …nding h0 (x) : Suppose however that we know how to calculate the derivative of the outside function f (x) and the inside function g (x). The chain rule then allows us to calculate the derivative of: h (x) = f (g (x)) as: Theorem 46 Chain Rule: If h (x) = f (g (x)) then h0 (x) = f 0 (g (x)) g 0 (x) : CHAPTER 2. UNIVARIATE CALCULUS 38 In the beginning it is common for students to have trouble with the chain rule. It should eventually become second nature but until then you might be better of being very systematic. The chain rule can be broken down into a recipe as follows: A Recipe for the Chain Rule 1. Identify the outside function f (x) and the inside function g (x) : (If you are not sure verify by putting the inside function g (x) inside f (x) as f (g (x)) and make sure you get h (x).) 2. Take the derivative of the outside function: f 0 (x). 3. Replace x in f 0 (x) in 2. with the inside function g (x) to obtain: f 0 (g (x)) : 4. Take the derivative of the inside function: g 0 (x). 5. Multiply the result in 3. by that in 4. to get: h0 (x) = f 0 (g (x)) g0 (x). Remark: It is important to correctly identify the outside and inside functions. If instead we were to put f (x) inside g (x) as g (f (x)) we obtain a di¤erent function. For example with f (x) = x1 and g (x) = 1 + x2 if instead of f (g (x)) one calculated: g (f (x)) = 1 + f (x)2 µ ¶2 1 = 1+ x 1 = 1+ 2 x which is not the same as f (g (x)) = Example 1: For h (x) = 1 1+x2 1 1+x2 : and following the recipe we have: 1. The outside function is f (x) = 1 x and the inside function is g (x) = 1 + x2 : 2. Taking the derivative of the outside function we obtain: f 0 (x) = ¡ x12 : 3. Putting the inside function inside the result in 2: we obtain: f 0 (g (x)) = ¡ 1 2 g (x) =¡ 1 (1 + x2 )2 : 4. Taking the derivative of the inside function we obtain: g 0 (x) = 2x: CHAPTER 2. UNIVARIATE CALCULUS 39 5. Multiplying 3: and 4: we obtain: à ! 1 2x £ |{z} 2x =¡ : h0 (x) = ¡ 2 2 (1 + x ) (1 + x2 )2 from step 4 | {z } from step 3 Example 2: For h (x) = p 1 + x4 we have: p 1 x 2 and thepinside function is g (x) = 1. The outside function is f (x) = x = p 4 1 + x : We verify this as: f (g (x)) = g (x) = 1 + x4 : 1 2. Taking the derivative of the outside function we obtain: f 0 (x) = 12 x¡ 2 = 1 p : 2 x 3. Putting the inside function inside the result in 2: we obtain: 1 1 : f 0 (g (x)) = p = p 2 1 + x4 2 g (x) 4. Taking the derivative of the inside function we obtain: g 0 (x) = 4x3 : 5. Multiplying 3: and 4: we obtain: µ ¶ 1 4x3 2x3 p £ |{z} 4x3 = p =p : h0 (x) = 4 4 2 1+x 2 1+x 1 + x4 | {z } from step 4 from step 3 2.1.9 Inverse Functions Given a function: y = f (x) we will often want to reverse x and y and make x the dependent variable and y the independent variable. For example we usually think of a demand curve as having Q, the quantity, as the dependent variable and P , the price, as the independent variable so we write Q = Q (P ). In some applications however it is easier to make P the dependent variable and Q the independent variable and write P = P (Q) ; which is the inverse demand curve. Example: Suppose we have the function y = f (x) = 6 ¡ 3x or changing the notation: Q = Q (P ) = 6 ¡ 3P and so we think of this as an ordinary demand curve. This demand curve treats quantity Q as the dependent variable and price P as the independent variable. CHAPTER 2. UNIVARIATE CALCULUS 40 Suppose we wanted instead to have P as the dependent variable or the inverse demand curve. By putting P on the left-hand sides as: Q = 6 ¡ 3P =) 3P = 6 ¡ Q we have: 1 P = P (Q) = 2 ¡ Q: 3 Translating back into the x; y notation the inverse demand curve takes the form: y = g (x) = 2 ¡ 13 x: The essence of an inverse function then is that we reverse the role of the independent variable x and the dependent variable y: Visually one obtains an inverse function by taking a graph and ‡ipping it around to put the y axis below and the x axis above. Thus a function and its inverse really express the same relationship between y and x: Of course in order to prove things about inverse functions we need to de…ne them. To see how this is done consider the demand curve and the inverse demand curve in the x; y notation: 1 f (x) = 6 ¡ 3x; g (x) = 2 ¡ x: 3 Suppose we put g (x) inside f (x) : Then we obtain a remarkable result: µ ¶ 1 f (g (x)) = 6 ¡ 3 2 ¡ x = x: 3 If instead we put f (x) inside g (x) then we get the same remarkable result: g (f (x)) = 2 ¡ 1 (6 ¡ 3x) = x: 3 In both cases we obtain x: This is in fact the basis for the de…nition of an inverse function: De…nition 47 Inverse Function: Given a function f (x) if there exists another function g (x) such that f (g (x)) = g (f (x)) = x then we say that g (x) is the inverse function of f (x) and f (x) is the inverse function of g (x) : Remark: If you think of x as trapped inside f (x) ; then applying the inverse function g (x) liberates x from f (x) since: g (f (x)) = x: CHAPTER 2. UNIVARIATE CALCULUS 41 Similarly f (x) liberates x from g (x) since: f (g (x)) = x: Often we when attempting to solve equations we will want to do just this, to get x outside by itself. In this case inverse functions are the tool we need. We have: Theorem 48 If f (x) = xn for x > 0 then the inverse function of f (x) is 1 g (x) = x n : Proof. If f (x) = xn then 1 1 g (f (x)) = g (xn ) = (xn ) n = xn£ n = x1 = x: 1 Example: If f (x) = x7 with x > 0 then the inverse function is g (x) = x 7 . Suppose you wish to solve the equation: f (x) = x7 = 3 for x; that is you wish to get x alone by itself. Using the inverse function to free x we …nd that: g (f(x)) = g (3) 1 =) x = 3 7 : Not all functions have an inverse function; in fact only globally increasing or decreasing functions have inverses: Theorem 49 Existence of an Inverse Function: The inverse function for f (x) exists if and only if f (x) is either globally increasing or globally decreasing. Example: The function f (x) = (x ¡ 1)2 does not have an inverse function since f 0 (x) < 0 for x < 1 and f 0 (x) > 0 for CHAPTER 2. UNIVARIATE CALCULUS 42 x > 1 as illustrated below: 4 3 y2 1 0 -1 1 2 x 3 f (x) = (x ¡ 1)2 The problem with this function is that if we ‡ip the graph around and make x the dependent variable as: 3 2 y 1 0 1 2 x 3 4 5 -1 Note that associated with each y are not one but two x0 s and so this is not a proper function. 2.1.10 The Derivative of an Inverse Function The question we now address is the relationship between the derivatives of two inverse functions f (x) and g (x) :We have: Theorem 50 Derivative of an Inverse Function: If f (x) has an inverse function g (x) then g 0 (x) is given by: g 0 (x) = 1 f 0 (g (x)) : CHAPTER 2. UNIVARIATE CALCULUS 43 Proof. Taking the derivative of both sides of f (g (x)) = x and using the chain rule we obtain: f 0 (g (x)) g 0 (x) = 1: Solving for g0 (x) then gives the result. Thus the slope of the inverse function g (x) is the inverse of the slope of the original function (with x replaced with g (x)). Example 1: We saw that demand and inverse demand curves written in x; y notation: 1 f (x) = 6 ¡ 3x; g (x) = 2 ¡ x 3 are inverses of each other. We have: f 0 (x) = ¡3 and g 0 (x) = ¡ 13 so that the two functions have derivatives which are inverses of each other. Example 2: If f (x) = x2 then f 0 (x) = 2x: The inverse function of f (x) is 1 g (x) = x 2 1 and so g 0 (x) = 12 x¡ 2 . Alternative we could calculate g 0 (x) from f (x) as: g0 (x) = = = = 2.1.11 1 f 0 (g (x)) 1 2 £ g (x) 1 1 2 £ x2 1 ¡1 x 2: 2 The Elasticity of an Inverse Function Suppose that f (x) has an inverse function: g (x) and that the corresponding elasticities are: ´ f (x) ´ We have: f 0 (x) x g 0 (x) x ; ´g (x) ´ : f (x) g (x) CHAPTER 2. UNIVARIATE CALCULUS 44 Theorem 51 If f (x) has an inverse function: g (x) then: ´g (x) = 1 : ´f (g (x)) Proof. Since g (x) is the inverse function of f (x) we have: x = f (g (x)) : Replacing x by f (g (x)) we obtain: ´g (x) = = Now replacing g 0 (x) by 1 f 0 (g(x)) g0 (x) x g (x) g0 (x) f (g (x)) g (x) from Theorem 50 we obtain: ´g (x) = = = = g0 (x) f (g (x)) g (x) f (g (x)) f 0 (g (x)) g (x) 1 ³ 0 ´ f (g(x))g(x) f (g(x)) 1 : ´f (g (x)) Example 1: Consider the function f (x) and its inverse g (x) given by: 1 f (x) = x3 ; g (x) = x 3 : Since both f (x) and g (x) are of the form Axb the elasticities of each are given by the exponents on x: Thus:´f (x) = 3 and ´g (x) = 13 = ´ 1(x) : f Example 2: Suppose a monopolist faces a demand curve: Q = Q (P ) that has an elasticity ´Q (P ) and that the inverse demand curve P = P (Q) has an elasticity: ´P (Q). Then from Theorem 51: ´P (Q) = 1 : ´Q (P (Q)) Now revenue for the monopolist as a function of Q is given by: R (Q) = Q £ P (Q) CHAPTER 2. UNIVARIATE CALCULUS 45 so that marginal revenue is from the product rule: MR (Q) = R0 (Q) = P (Q) + Q £ P 0 (Q) µ ¶ P 0 (Q) £ Q = P (Q) 1 + P (Q) = P (Q) (1 + ´ P (Q)) µ ¶ 1 = P (Q) 1 + : ´Q (P (Q)) Since the monopolist choose Q where MR = M C; and since M C > 0 it follows that µ ¶ 1 P (Q) 1 + > 0 =) ´ Q (P (Q)) < ¡1 ´ Q (P (Q)) so the monopolist always acts on the elastic part of the demand curve. 2.2 Second Derivatives He who can digest a second or third derivative ... need not, we think, be squeamish about any point of divinity. -George Berkeley Since the derivative f 0 (x) is a function, it too has a derivative, which is the second derivative of f (x). We have then: De…nition 52 Second Derivative: The second derivative of f (x) ; denoted d2 y d 0 0 by f 00 (x) or dx 2 or dx (f (x)) is the …rst derivative of f (x) : Example: Consider the function: f (x) = x3 ¡ x =) f 0 (x) = 3x2 ¡ 1: The second derivative f 00 (x) is then the …rst derivative of the …rst derivative or: f 0 (x) = 3x2 ¡ 1 =) f 00 (x) = 6x: 2.2.1 Convexity and Concavity Alice didn’t dare to argue the point, but went on: “and I thought I’d try and …nd my way to the top of that hill.” “When you say ‘hill’,” the Queen interrupted,” I could show you hills, in comparison with which you’d call that a valley.” “No, I shouldn’t,” said Alice, surprised into contradicting her at last: “a hill can’t be a valley, you know, That would be nonsense.” -Lewis Carroll Through the Looking Glass. CHAPTER 2. UNIVARIATE CALCULUS 46 While the sign of f 0 (x) tells you if the function is upward or downward sloping, the sign of f 00 (x) tells you whether you or standing on a mountain or in a valley or in mathematical jargon, whether the function is concave or convex. We have: De…nition 53 Local Concavity: The function f (x) is locally concave at xo (or locally mountain-like) if and only if f 00 (xo ) < 0: De…nition 54 (Global) Concavity: The function f (x) is (globally) concave (or globally mountain-like) if and only if f 00 (x) < 0 for all x in the domain of f (x) : De…nition 55 Local Convexity: The function f (x) is locally convex xo (or locally valley-like) if and only if f 00 (xo ) > 0: De…nition 56 (Global) Convexity: The function f (x) is (globally) convex (or globally valley-like) if and only if f 00 (x) > 0 for all x in the domain of f (x) : Concavity and convexity are fundamental concepts and so we will be referring to them often. It will quickly become tedious if we always have to qualify concavity and convexity with either ‘local’ or ‘global’. For this reason we will adopt the following convention: Convention: When we say a function is concave without saying ‘global’ or ‘local’, we mean the function is globally concave. Similarly if we say a function is convex without saying ‘global’ or ‘local’, we mean the function is globally convex. Example 1: Given f (x) = x3 ¡ x if x0 = ¡1 then f 0 (¡1) = 2 > 0 and f 00 (¡1) = ¡6 < 0 and so at x0 = ¡1; f (x) is locally increasing (or upward sloping) and locally concave (locally mountain-like ). At x0 = 1 f 0 (1) = 2 > 0 and f 00 (1) = 6 > 0 and so at x0 = 1 f (x) is locally increasing and locally convex (or locally valleylike). More generally since f 00 (x) = 6x it follows in general that f 00 (x) < 0 for x < 0 and hence f (x) is locally concave (or locally mountain-like) for x < 0: Similarly for x > 0 f 00 (x) > 0 and so f (x) is locally convex (or locally valleylike). CHAPTER 2. UNIVARIATE CALCULUS 47 Example 2: Given: f (x) = 1 = x¡1 for x > 0 x we have f 0 (x) = ¡x¡2 = ¡ 1 <0 x2 for all x: Hence f (x) is globally decreasing. Furthermore: f 00 (x) = 2x¡3 = 2 >0 x3 for all x and so that f (x) is globally convex. These properties of f (x) are illustrated in the plot below: 5 4 3 y 2 1 1 2 x f (x) = 3 4 5 : 1 x Example 3: Given the function: f (x) = ¡ 1 = ¡x¡1 for x > 0 x we have f 0 (x) = x¡2 = 1 >0 x2 for all x: Hence f (x) is (globally) increasing or monotonic. Furthermore: f 00 (x) = ¡2x¡3 = ¡ 2 <0 x3 CHAPTER 2. UNIVARIATE CALCULUS 48 for all x and so f (x) is globally concave or globally mountain-like. This is illustrated in the plot below: -1 -2 y -3 -4 -5 1 2 x f (x) = 2.2.2 3 4 5 : ¡ x1 Economics and ‘Diminishing Marginal ...’ In economics one often hears the expression: ‘diminishing marginal ...’. Recall that the marginal is the derivative f 0 (x) : Thus if the marginal is decreasing the …rst derivative of f 0 (x) or f 00 (x) must be negative or d (f 0 (x)) < 0: dx Thus stating that the marginal is decreasing is equivalent to stating that f 00 (x) < 0 or that the function is concave (mountain-like). f 00 (x) = Example: The Cobb-Douglas production function: p Q = f (L) = L plotted below: 5 4 Q3 2 1 0 5 10 L 15 20 The Production Function: Q = 25 p L CHAPTER 2. UNIVARIATE CALCULUS 49 has a marginal product of labour: 1 M PL (L) = f 0 (L) = p 2 L which although positive, decreases as L increases as plotted below: 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 5 10 L 15 MPL (L) = 20 25 : 1 p 2 L Equivalently M PL0 (L) = f 00 (L) = ¡1 ³p ´3 < 0 4 L and so a diminishing marginal product of labour is equivalent to the production function being concave or mountain-like. 2.3 2.3.1 Maximization and Minimization First-Order Conditions The cornerstone of economic thinking is that people are assumed to be rational. Rational behavior generally means maximizing or minimizing something. Thus rational households maximize utility and rational …rms maximize pro…ts. Both pro…t maximizers and utility maximizers must minimize costs. A maximum (minimum) is found on the top of a mountain (bottom of a valley) at a point where the mountain (valley) is ‡at. If it were not ‡at then you could always go a little higher (lower) by either moving up or down the slope. This intuition leads to the …rst-order conditions for a maximum or minimum: Theorem 57 First-Order Condition for a Maximum. If f (x) is maximized at x = x¤ then f 0 (x¤ ) = 0: CHAPTER 2. UNIVARIATE CALCULUS 50 Theorem 58 First-Order Condition for a Minimum. If f (x) is minimized at x = x¤ then f 0 (x¤ ) = 0: Remark: Calculating …rst-order conditions is one of the most basic skills that an economist must have. Although this is often straightforward, there are nevertheless common problems that occur when students do not derive them systematically. For this reason you may wish at the beginning to use the following recipe: Recipe for Calculating First-Order Conditions Given a function f (x) to be maximized or minimized: 1. Calculate the …rst derivative f 0 (x) : 2. Replace all occurrences of x in 1: with x¤ and the resulting expression equal to 0: 3. If possible solve the expression in 2. for x¤ or if this is not possible, try and learn something about x¤ from the …rst-order conditions. Example 1: Consider the function: f (x) = x3 ¡ x Following the recipe we have: 1. Calculate the derivative of f (x) : f 0 (x) = 3x2 ¡ 1: 2. Put a ¤ on x in 1: and set the result equal to 0: Thus: f 0 (x¤ ) = 3 (x¤ )2 ¡ 1 = 0: 3. Solving the expression in 2: we …nd that: 3 (x¤ )2 ¡ 1 = 0 =) (x¤ )2 = 1 3 1 1 =) x¤1 = p and x¤2 = ¡ p : 3 3 Example 2: Consider the function f (x) = x1=2 ¡ x with domain x ¸ 0: Following the recipe we have: CHAPTER 2. UNIVARIATE CALCULUS 51 1. Calculate the derivative of f (x) : 1 f 0 (x) = x¡1=2 ¡ 1: 2 2. Put a ¤ on x in 1: and set the result equal to 0: Thus: f 0 (x¤ ) = 1 ¤ ¡1=2 (x ) ¡ 1 = 0: 2 3. Solving the expression in 2. we …nd that: 1 ¤ ¡1=2 (x ) ¡1 2 = 0 =) (x¤ )¡1=2 = 2 1 =) x¤ = 2¡2 = : 4 2.3.2 Second-Order Conditions The …rst-order conditions for a maximum and a minimum are identical and so just from: f 0 (x¤ ) = 0 we have no way of knowing whether x¤ is a maximum or a minimum. This is then where the second derivative f 00 (x) and the secondorder conditions become useful. The basic principle is that mountains (concave functions with f 00 (x) < 0 ) have tops or maxima and valleys (convex functions with f 00 (x) > 0 ) have bottoms or minima. For the moment we will only deal with local maxima and minima. If x¤ satis…es the …rst-order conditions: f 0 (x¤ ) = 0 and at x¤ the function is locally mountain-like, then x¤ must be a local maximum. If at x¤ the function is locally valley-like, then x¤ must be a local minimum. Consequently we have: Theorem 59 Second-Order Conditions for a Local Maximum. If f 0 (x¤ ) = 0 and f 00 (x¤ ) < 0 (i.e. f (x) is locally concave or mountain-like at x¤ ) then x¤ is a local maximum. Theorem 60 Second-Order Conditions for a Local Minimum. If f 0 (x¤ ) = 0 and f 00 (x¤ ) > 0 (i.e. f (x) is locally convex or valley-like at x¤ ) then x¤ is a local minimum. Example 1 (continued): For the function: f (x) = x3 ¡ x the solutions to the …rst-order conditions are: x¤1 = lating f 00 (x) we have: p1 3 and x¤2 = ¡ p13 : Calcu- f 0 (x) = 3x2 ¡ 1 =) f 00 (x) = 6x: CHAPTER 2. UNIVARIATE CALCULUS 52 To begin consider x¤1 = p13 we …nd that: ¶ µ 1 1 f 00 p = 6 £ p = 3:4641 > 0 3 3 so that f (x) is locally convex at x¤1 and hence from the second-order conditions x¤1 = p13 is a local minimum. For x¤2 = ¡ p13 we …nd that ¶ µ ¡1 1 f 00 ¡ p = 6 £ p = ¡3:4641 < 0 3 3 so that f (x) is locally concave at x¤2 and hence from the second-order conditions x¤2 = ¡ p13 is a local maximum. Example 2 (continued): For the function: f (x) = x1=2 ¡ x the solution to the …rst-order conditions is: x¤ = 14 : Now since ¡ ¢¡3=2 µ ¶ ¡ 14 x¡3=2 1 00 00 =) f = ¡2 < 0 f (x) = ¡ = 4 4 4 it follows that f (x) is locally concave (or mountain-like) at x¤ = x¤ = 14 is a local maximum. 2.3.3 1 4 and hence Su¢cient Conditions for a Global Maximum or Minimum In economics we are usually only interested in global maxima or minima. A pro…t maximizing …rm would not chose a local pro…t maximum if it were not a global maximum. If x¤ satis…es the …rst and second-order conditions then all we can say is that x¤ is a local maximum or a local minimum. We do not know if we are at a global maximum or a global minimum. If a function f (x) is globally concave then it is everywhere mountain-like; essentially f (x) is one mountain. Now if you …nd a ‡at spot on this one mountain it must be a global maximum, there can be no higher point on the mountain. . Similarly if a function f (x) is globally convex then it is everywhere valleylike so that essentially f (x) is one valley. Now if you …nd a ‡at spot on this one valley it must be a global minimum. Thus local concavity or convexity insures that if f 0 (x¤ ) = 0 then x¤ is a local maximum or minimum. Global concavity or convexity on the other hand insures that x¤ is a global maximum or minimum. Theorem 61 If a function f (x) is globally concave so that f 00 (x) < 0 for all x and x¤ satis…es the …rst-order conditions: f 0 (x¤ ) = 0 then x¤ is a unique global maximum. CHAPTER 2. UNIVARIATE CALCULUS 53 Theorem 62 If a function f (x) is globally convex so that f 00 (x) > 0 for all x and x¤ satis…es the …rst-order conditions: f 0 (x¤ ) = 0 then x¤ is a unique global minimum. Note that this is a su¢cient condition for x¤ to be a global maximum (minimum), it is not a necessary condition for a global maximum (minimum). As we shall see there are functions which are not globally concave or convex which nevertheless have a unique global maximum or minimum. Example 1: The function: f (x) = x3 ¡ x actually has no global maximum or minimum since f (x) ! §1 as x ! §1 as shown in the plot below: 6 4 y 2 -2 -1 0 1 x 2 -2 -4 -6 : 3 f (x) = x ¡ x However if we restrict the domain to be x > 0 then since x > 0: f 00 (x) = 6x > 0 and so the function is globally convex. We saw there were two solutions to the …rst-order conditions: x¤1 = p13 and x¤2 = ¡ p13 : Since x¤2 is negative it is not now in the domain of f (x) so that CHAPTER 2. UNIVARIATE CALCULUS x¤1 = p1 3 54 is the unique global minimum. This is illustrated in the plot below: 6 5 4 y3 2 1 0 0.5 1 x 1.5 2 : f (x) = x3 ¡ x for x > 0 Example 2: Consider the function f (x) = x1=2 ¡ x with domain x ¸ 0 which is plotted below: 0.2 0 0.5 x 1 1.5 2 -0.2 y -0.4 -0.6 : f (x) = x1=2 ¡ x Now since 1 f 00 (x) = ¡ x¡3=2 < 0 4 for all x (since x¡3=2 > 0) it follows that f (x) is globally concave and x¤ = is the unique global maximum. 1 4 CHAPTER 2. UNIVARIATE CALCULUS 2.3.4 55 Pro…t Maximization Then I looked on all the works that my hands had wrought, and on the labour that I had laboured to do; and, behold, all was vanity and a striving after wind, and there was no pro…t under the sun. -Ecclesiastes 2:11 One should always generalize. -Karl Jacobi Consider the problem of a …rm maximizing pro…ts in the short-run. We are going to look at this problem from various levels of generality, beginning with a very simple special case and from there increasing the level of generality. Often in textbooks one sees the most general case …rst and then one looks at speci…c examples. Knowledge in real life does not usually progress in this manner; rather new ideas typically begin with special cases from which a researcher develops some understanding and curiosity. From there he or she then attempts to generalize the results of the special case. Example 1: Consider the least general case of a …rm with Cobb-Douglas production function where: 1 Q = f (L) = L 2 : Pro…ts for the …rm are: ¼ (L) = P f (L) ¡ W L 1 = PL2 ¡ WL where P is the price the …rm receives and W is the nominal wage. Di¤erentiating with respect to L we …nd that: 1 1 ¼0 (L) = P L¡ 2 ¡ W 2 so that putting a ¤ on L and setting the derivative equal to 0 yields the …rst-order condition for pro…t maximization: 1 1 P L¤¡ 2 ¡ W = 0: 2 If we de…ne w = W P as the real wage then this implies that: 1 ¤ ¡ 12 (L ) =w 2 or: 1 L¤ = w¡2 : 4 CHAPTER 2. UNIVARIATE CALCULUS 56 This gives the pro…t maximizing L¤ as a function of the real price of labour w and so is the …rm’s labour demand curve. Since L¤ has the functional form Axb the elasticity of demand is the exponent on w or ¡2. 6 5 4 L3 2 1 0 0.2 0.4 0.6 0.8 1 w 1.2 1.4 1.6 1.8 2 L¤ (w) = 14 w¡2 Furthermore L¤ is a global maximum since: 1 3 ¼00 (L) = ¡ P L¡ 2 < 0 4 and hence ¼ (L) is globally concave. If P = 4 and W = 2 then w = 24 = 12 and the …rm would hire 1 1 1 L¤ = w¡2 = ¡ ¢¡2 = 1 4 4 1 2 or one worker. If there were a 100% in‡ation so that P and W doubled to P = 8 and W = 4 then the real wage remains the same as w = 12 and so L¤ stays at L¤ = 1: This re‡ects the fact that a rational …rm only cares about the real wage when hiring. The …rm’s supply curve is found by substituting the optimal amount of 1 labour L¤ = 14 w¡2 into the production function: Q = f (L) = L 2 so that: µ ¶1 1 ¡2 2 ¤ w Q = 4 1 ¡1 w = 2 µ ¶¡1 1 W = 2 P 1 p: = 2 ¤ P 1 where p = W is the real price of the good Q: Thus: dQ dp = 2 > 0 and the supply curve slopes upwards. Note that the supply curve has the form Axb and so the elasticity of supply is the exponent on p or 1: CHAPTER 2. UNIVARIATE CALCULUS 57 Example 2: We have found that the …rm’s labour demand curve slopes downwards and its supply curve slopes upwards, that both depend on the real prices w and p, and both have a constant elasticity. Suppose you were the …rst to actually derive these results. If you are curious you would then ask yourself if these results are merely a coincidence or indicative of more general results? We therefore seek to generalize. To do this we could replace the exponent: 12 1 exponent on Q = L 2 with say 13 and redo the analysis. Once this done we could 1 then replace 3 with 25 and so on. The problem with this approach is that there are an in…nite number of possible exponents on L we could use so we would never arrive at any …rm conclusions. Suppose instead we replace 12 not with a number but with a letter ® and write the …rm’s production function as: Q = f (L) = AL® : where A > 0 and where we assume that: 0 < ® < 1: By doing this we will be able to analyze an in…nite number of possible exponents at once! The assumptions on ® insure a positive and diminishing marginal product of labour since: M PL (L) ´ f 0 (L) = ®AL®¡1 > 0 and: ¡ z }| { dM PL (L) = ®(® ¡ 1)AL®¡2 < 0 f (L) = dL 00 since ® < 1. Pro…ts for the …rm are given by: ¼ (L) = P f (L) ¡ W L = P AL® ¡ W L so that: ¼ 0 (L) = ®P AL®¡1 ¡ W: Putting a ¤ on L and setting the result equal to 0 yields the …rst-order condition for pro…t maximization: ®¡1 ®P A (L¤ ) ¡ W = 0: Solving for L¤ we obtain the labour demand curve: 1 1 L¤ = (®A) 1¡® w ®¡1 CHAPTER 2. UNIVARIATE CALCULUS 58 where w is the real wage. This has the form Axb and so the elasticity of demand is the exponent on w or: 1 <0 ®¡1 since ® < 1: It follows then that the labour demand curve slopes downwards. Note this includes the results of the previous example as a special case since if ® = 12 the elasticity becomes 1 2 1 = ¡2: ¡1 This is a global pro…t maximum since: ¡ z }| { ¼ (L) = P ®(® ¡ 1)AL®¡2 < 0 00 so that ¼ (L) is globally concave. The supply curve is found by substituting L¤ into Q = f (L) so that: ´® ³ 1 1 Q¤ = f (L¤ ) = A (®A) 1¡® w ®¡1 1 ® 1 ® ® = A 1¡® ® 1¡® w ®¡1 ® µ ¶ ®¡1 1 ® W = A 1¡® ® 1¡® P ® = A 1¡® ® 1¡® p 1¡® ® = Bp 1¡® 1 ® P where: B = A 1¡® ® 1¡® and p = W is the real price of Q: This also has the form b Ax and so the elasticity of supply is the exponent on p: ® >0 1¡® and so the supply curve slopes upwards. This includes the previous result as a special case since if ® = 12 the elasticity of supply is 1 2 1¡ 1 2 = 1: Example 3: We see from the previous example that the elasticities change when we change ® from 12 but that the …rm’s labour demand curve still slopes downward and the supply curve still slopes upward. If however we allowed ® > 1 then the expression for the labour demand curve elasticity would become positive; that is: 1 > 0: ®¡1 CHAPTER 2. UNIVARIATE CALCULUS 59 However ® > 1 would also mean that we would no longer have a diminishing marginal product of labour and the pro…t function would no longer be concave but instead would be convex. All these clues point to the fact that it is the diminishing marginal product of labour that is the key requirement in obtaining a downward sloping labour demand and upward sloping supply curves. We now attempt to generalize even further. Consider replacing Q = AL® with: Q = f (L) where the only assumptions we now make are that the marginal product of labour is positive and diminishing so that: M PL (L) = f 0 (L) > 0 and: M PL0 (L) = f 00 (L) < 0: Pro…ts as a function of L are then given by: ¼ (L) = P f (L) ¡ W L Di¤erentiating with respect to L we …nd that: ¼ 0 (L) = P f 0 (L) ¡ W so that putting a ¤ on L and setting the derivative equal to 0 yields the …rst-order condition for pro…t maximization: ¼0 (L¤ ) = 0 =) P f 0 (L¤ ) ¡ W = 0 =) M PL (L¤ ) = w: This result shows that the inverse labour demand curve is in fact the marginal product of labour curve. Furthermore it shows that labour demand L¤ is a function of the real wage w: Since ¼00 (L) = P f 00 (L) < 0 for all L; the pro…t function is globally concave and L¤ is a global maximum. Consider now the problem of showing that the labour demand curve L¤ = ¤ L (w) is downward sloping. The labour demand curve is implicitly de…ned by: M PL (L¤ (w)) = w: ¤ ¤ We would like …nd the derivative dLdw(w) and show that: dLdw(w) < 0; that is that the demand for labour slopes downwards. The problem is that L¤ (w) is trapped inside the marginal product of labour M PL (L) and so we cannot get at ¤ it directly. We can however use the chain rule to get an expression for dLdw(w) : Here MPL (L) is the outside function and L¤ (w) is the inside function. Thus di¤erentiating both sides with respect to the real wage w we obtain: d d dL¤ (w) M PL (L¤ (w)) = w =) M PL0 (L¤ (w)) = 1: dw dw dw CHAPTER 2. UNIVARIATE CALCULUS 60 ¤ Note that the chain rule forces dLdw(w) outside where we can now work with it. Since we assume a diminishing marginal product of labour or MPL0 (L) < 0 we conclude that: 1 dL¤ (w) = <0 0 dw M PL (L¤ (w)) which shows then that the labour demand curve is downward sloping. The …rm’s supply curve is given by replacing L with L¤ (w) in the production function as: ¢¢ ¡ ¡ Q¤ = f (L¤ (w)) = f L¤ p¡1 : P where p = W and hence p¡1 = W P = w: Consider now showing that the supply ¤ > 0: Here p is buried deep inside: f (L) ; L¤ (w) curve slopes upwards or: dQ dp ¡1 and p so we will need to use the chain rule three times!. We have: dQ¤ dp = f 0 (L¤ (w)) dL¤ (w) dp¡1 dw dp ¡ ¡ z }| { z}|{ + z }| { dL¤ (w) 1 0 ¤ £¡ 2 = f (L (w)) dw p > 0: Thus the …rm’s supply curve is upward sloping. This problem illustrates how one can obtain very general results with very minimal assumptions. We have shown that given only a positive and diminishing marginal product of labour that labour demand must slope downwards and supply must slope upwards. Furthermore demand and supply are functions of the real prices w and p: 2.4 Econometrics Econometrics is the bridge between theory and real life. -James Ramsey 2.4.1 Least Squares Estimation Estimation of a Constant: ¹ Consider the simple linear regression model: Yi = ¹ + ei ; i = 1; 2; : : : n: Here ei is random noise. If it were not for this random noise each Yi would be identical as: Yi = ¹: Since the data is corrupted however we do not get to CHAPTER 2. UNIVARIATE CALCULUS 61 directly observe ¹ but only a sample of the Yi 0 s . Our problem is to guess what ¹ is. ^ ; the least squares estimator of ¹ which minimizes the sum of Our guess is ¹ squares function: S (¹) = n X i=1 (Yi ¡ ¹)2 = (Y1 ¡ ¹)2 + (Y2 ¡ ¹)2 + ¢ ¢ ¢ + (Yn ¡ ¹)2 : ^ is x¤ : The function: S (¹) is in fact Here ¹ plays the role of the x variable and ¹ just a quadratic. Using the sum and chain rule to di¤erentiate S (¹) we have: S 0 (¹) = ¡2 (Y1 ¡ ¹) ¡ 2 (Y2 ¡ ¹) ¡ ¢ ¢ ¢ ¡ 2 (Yn ¡ ¹) = ¡2 (Y1 + Y2 + ¢ ¢ ¢ + Yn ¡ n¹) : It follows then that the …rst-order conditions require: S 0 (^ ¹) ¹) = 0 = ¡2 (Y1 + Y2 + ¢ ¢ ¢ + Yn ¡ n^ =) n^ ¹ = Y1 + Y2 + ¢ ¢ ¢ + Yn Y1 + Y2 + ¢ ¢ ¢ + Yn = Y¹ : =) ¹ ^= n Thus our best guess of ¹ is the sample mean of the Yi 0 s. S (¹) is globally convex since: S 00 (¹) = |2 + 2 + {z¢ ¢ ¢ + 2} n times = 2n > 0 ^ is in fact a global minimum of S (¹). and so ¹ Example: Suppose one has data on the consumption of n = 4 families: i: 1 2 3 4 Yi = 72 58 63 55: Here each family consumes di¤erent amounts than ¹ because of random noise ei (e.g., unexpected dental bills). To …nd ¹ ^ we construct the sum of squares function: S (¹) = (72 ¡ ¹)2 + (58 ¡ ¹)2 + (63 ¡ ¹)2 + (55 ¡ ¹)2 CHAPTER 2. UNIVARIATE CALCULUS 62 which is plotted below: 7000 6000 5000 4000 3000 2000 1000 20 40 60 mu 80 100 : S (¹) As illustrated in the graph, the minimum of S (¹) occurs at the sample mean 1 ¹ ^ = Y¹ = (72 + 58 + 63 + 55) = 62: 4 Linear Regression Now suppose Yi varied systematically with another variable, called a regressor: Xi as: Yi = Xi ¯ + ei ; i = 1; 2; : : : n: For example if Yi is the consumption of family i then Xi might be their income dYi . We so that ¯ would be the marginal propensity to consume since: ¯ = dX i cannot again directly observe ¯ from the data because the data is corrupted by the random noise ei : Instead we estimate ¯ from a set of n observations on Yi ^ which minimizes: and Xi using the least squares estimator ¯ S (¯) = n X i=1 (Yi ¡ Xi ¯)2 = (Y1 ¡ X1 ¯)2 + (Y2 ¡ X2 ¯)2 + ¢ ¢ ¢ + (Yn ¡ Xn ¯)2 : Using the sum and chain rule to di¤erentiate S (¯) we have: S 0 (¯) = ¡2X1 (Y1 ¡ X1 ¯) ¡ 2X2 (Y2 ¡ X2 ¯) ¡ ¢ ¢ ¢ ¡ 2Xn (Yn ¡ Xn ¯) CHAPTER 2. UNIVARIATE CALCULUS 63 so that the …rst-order conditions require: ³ ´ ³ ³ ³ ´ ´ ´ ^ ^ ¡ 2X2 Y2 ¡ X2 ¯ ^ ¡ ¢ ¢ ¢ ¡ 2Xn Yn ¡ Xn ¯ ^ = 0 = ¡2X1 Y1 ¡ X1 ¯ S0 ¯ ³ ³ ³ ´ ´ ´ ^ + X2 Y2 ¡ X2 ¯ ^ + ¢ ¢ ¢ + Xn Yn ¡ Xn ¯ ^ =0 =) X1 Y1 ¡ X1 ¯ ¢ ¡ ^ = X1 Y1 + X2 Y2 + ¢ ¢ ¢ + Xn Yn =) X12 + X22 + ¢ ¢ ¢ + Xn2 ¯ ^ = X1 Y1 + X2 Y2 + ¢ ¢ ¢ + Xn Yn : =) ¯ X12 + X22 + ¢ ¢ ¢ + Xn2 S (¯) is globally convex as long as Xi 6= 0 for at least one i (i.e., at least one family has a non-zero income) since: S 00 (¯) = 2X12 + 2X22 + ¢ ¢ ¢ + 2Xn2 > 0: ^ is a global minimum. It follows that the least squares estimator ¯ Example: Suppose one has data on the consumption of n = 4 families along with their income as: i: 1 2 3 4 Yi = 72 58 63 55 Xi = 98 80 91 73 so that for example family 2 has consumption of 58 and an income of 80: We seek the best line Y = ¯X which goes through the data plotted below: 95 90 85 80 75 56 58 60 62 64 Income 66 68 70 72 : The sum of squares is then: S (¯) = (72 ¡ 98¯)2 + (58 ¡ 80¯)2 + (63 ¡ 91¯)2 + (55 ¡ 73¯)2 CHAPTER 2. UNIVARIATE CALCULUS 64 which is plotted below: 3000 2500 2000 1500 1000 500 0 0.4 0.5 0.6 0.7 beta 0.8 0.9 1 :: S (¯) As illustrated in the graph, the minimum of S (¯) occurs at: ^ ¯ X1 Y1 + X2 Y2 + ¢ ¢ ¢ + Xn Yn X12 + X22 + ¢ ¢ ¢ + Xn2 98 £ 72 + 80 £ 58 + 91 £ 63 + 73 £ 55 = 982 + 802 + 912 + 732 = 0:724: = ^= Thus the estimated marginal propensity to consume from this data set is ¯ 0:724: 2.4.2 Maximum Likelihood Maximum likelihood is a very general technique used in econometrics which can be applied to almost any problem including those where linear regression fails. The basic approach is to calculate the likelihood L (µ) where µ is a parameter of interest. The maximum likelihood estimator of µ is then that µ which maxiµ and hence solves the …rst-order mizes L (µ) : This is traditionally denoted by ^ conditions: ³ ´ dL ^ µ = 0: dµ It is usually easier to maximize the log-likelihood de…ned by l (µ) = ln (L (µ)) which gives the same result since ln (x) is a monotonic; that is: ³ ´ ³ ´ ³ ´ ^ dL ^µ dl ^ µ dL µ 1 =0, = ³ ´ = 0: dµ dµ dµ L ^ µ CHAPTER 2. UNIVARIATE CALCULUS 65 Once ^µ is found from the …rst-order conditions, we often wish to construct a con…dence interval which will then indicate how accurate our guess is. Traditionally one constructs a 95% con…dence interval for the unknown µ; which takes the form: p ^ µ § 1:96 £ ± where ± is the variance of ^µ: This formula says that µ will lie within the interval: p p ^µ ¡ 1:96 £ ± · µ · ^ µ + 1:96 £ ± 95 time out of 100; or equivalently, 19 times out of 20: To construct our con…dence interval calculate ± using: ³ ´ 1¡1 0 d2 l ^ µ A : ± = @¡ 2 dµ Note that since we are maximizing the log-likelihood from the second-order µ) d2 l(^ conditions: dµ2 < 0 so that ± > 0. Example: Suppose an unknown proportion of the population: µ favour some policy while 1 ¡ µ are against this policy. You decide to conduct a survey of the n randomly chosen people to estimate the unknown µ: Suppose mi = 1 if the ith person says he supports the policy, and mi = 0 if he says he does not support the policy. Since µ is the probability that mi = 1 and 1 ¡ µ is the probability that mi = 0 the probability of mi is given by: Pr [mi ] = µmi (1 ¡ µ)1¡mi : Since each person is chosen independently, the likelihood is the product of these probabilities: 1¡m1 £ µm2 (1 ¡ µ)1¡m2 £ ¢ ¢ ¢ £ µmn (1 ¡ µ)1¡mn L (µ) = µm1 (1 ¡ µ) n¡m = µm (1 ¡ µ) where m is the number of people in your survey who favour the policy and n¡m is the number of period in the survey against the policy. The log-likelihood is then: l (µ) = ln (L (µ)) = m ln (µ) + (n ¡ m) ln (1 ¡ µ) : Using the chain rule we …nd that: dl (µ) dµ ³ ´ dl µ^ m n¡m m n¡m = ¡ =) =0= ¡ ^ µ 1¡µ dµ µ 1¡^ µ m ^ =) µ = : n CHAPTER 2. UNIVARIATE CALCULUS 66 Thus if m = 525 say they are in favor and n = 2000 are interviewed then the log-likelihood is: l (µ) = 525 ln (µ) + (2000 ¡ 525) ln (1 ¡ µ) which is plotted below: -1200 -1400 -1600 -1800 -2000 -2200 -2400 -2600 -2800 -3000 -3200 -3400 0.1 0.2 0.3 0.4 0.5 theta 0.6 0.7 0.8 0.9 l (µ) As illustrated in the graph, the maximum occurs at: 525 ^ µ= = 0:26 2000 which says that the best guess about µ; based on the poll, is that 26 percent of the population are in favour of the policy. µ = 0:26 we use: To calculate a con…dence interval for ^ 0 1¡1 =n(1¡^ µ) ³ ´ 1¡1 ³ ´ =n^ µ 0 }| { z z}|{ 2 ^ ^ B m C µ 1 ¡ µ d l µ^ n ¡ m C A =B ± = @¡ B 2 +³ ´2 C = @ ^ A n dµ 2 ^ µ 1¡µ so that a 95% con…dence interval for the unknown µ takes the form: v ³ ´ u u^ µ tµ 1 ¡ ^ ^ µ § 1:96 £ : n Thus if m = 525 out of n = 2000 people polled are in favour of the policy, then: ±= 0:26 (1 ¡ 0:26) 2000 and the con…dence interval is: r 0:26 £ (1 ¡ 0:26) 2000 or 0:26 § 0:019: Thus the poll would be accurate to within 1:9 percentage points 19 times out of 20 (or 95 times out of 100): 0:2625 § 1:96 £ CHAPTER 2. UNIVARIATE CALCULUS 2.5 67 Ordinal and Cardinal Properties 2.5.1 Class Grades Consider a class of 4 students: John, Mary, Joe, and Sue. Suppose the instructor gives an A to the student with the highest grade, a B to the next highest and so on. The numerical and letter grades might then look like this: John 75 B Mary 50 D Joe 65 C Sue : 85 A Now suppose instead that the instructor adjusts (or bells) the grades by applying a monotonic function g (x) (with g 0 (x) > 0) to each grade. This could be g (x) = x ¡ 3:75; which would insure that the class average is 65 and which satis…es g 0 (x) = 1 > 0; or something crazy like g (x) = x2 which yields: John 752 = 5625 B Mary 502 = 2500 D Joe 652 = 4225 C Sue : 852 = 7225 A Notice that the numerical grades change when g (x) is applied (for example Joe’s grade changes from 65 to 4225) but that the letter grades do not change (for example Joe received a C before the grades were adjusted and he receives a C after the grades are adjusted.) If instead the instructor used g (x) = x3 we would …nd that: John 753 = 421875 B Mary 503 = 125000 D Joe 653 = 274625 C Sue 853 = 614125 A and all letters grades still remain unchanged. Adjusting the grades with some monotonic g (x) is known as a monotonic transformation. Letter grades are an example of what is known as an ordinal property, one which does not change no matter what monotonic transformation is applied. Cardinal properties, on the other hand, are properties which do change when a monotonic transformation is applied. Thus the student’s numerical grades are cardinal properties. It is important that we restrict ourselves to monotonic transformations; that is we do require that g 0 (x) > 0: To see why suppose instead that the instructor used: g (x) = x¡1 . Then we obtain: John 75¡1 = 0:0133 C Mary 50¡1 = 0:020 A Joe 65¡1 = 0:0154 B Sue : 85¡1 = 0:0118D Now it is Mary who receives an A and Sue who receives a D and the letter grades do change here. The problem here is that: g 0 (x) = ¡x¡2 = ¡ and so g (x) = 1 x 1 <0 x2 is not a monotonic transformation. CHAPTER 2. UNIVARIATE CALCULUS 2.5.2 68 Ordinal and Cardinal Properties of Functions Let us now turn our attention to ordinal and cardinal properties of a function: f (x). We have: De…nition 63 Monotonic Transformation: If g (x) is a monotonic (i.e., g0 (x) > 0 for all x) then applying g (x) to f (x) as g (f (x)) is called a monotonic transformation of f (x) : We then have: De…nition 64 Ordinal Property: An ordinal property of a function y = f (x) is one which does not change when any monotonic transformation g (x) is applied to f (x) : De…nition 65 Cardinal Property: A cardinal property of a function y = f (x) is one which does change when at least one monotonic transformation g (x) is applied to f (x) : With class grades the student with the highest grade (Jane) and the student with the lowest grade (Mary) are always the same no matter what monotonic g (x) is applied. For functions Jane and Mary correspond to the global maximum and minimum x¤ and so we have: Theorem 66 The global maximum (global minimum) x¤ of f (x) is an ordinal property. If f (x) = g (h (x)) with g 0 (x) > 0 then x¤ is a global maximum (minimum) of f (x) if and only if x¤ is a global maximum (minimum) of h (x) : Example: Consider: 1 f (x) = x ¡ x2 2 for 0 < x < 2: You can easily show that f (x) has a global maximum at x¤ = 1: Now suppose we apply the monotonic transformation: g (x) = x2 which leads us to: µ ¶2 1 h (x) = g (f (x)) = x ¡ x2 : 2 CHAPTER 2. UNIVARIATE CALCULUS 69 According to the theorem x¤ = 1 is also a global maximum of h (x) : This can be seen from the plot of f (x) and h (x) below: 0.5 0.4 0.3 0.2 0.1 0 2.5.3 0.2 0.4 0.6 0.8 1 x 1.2 1.4 1.6 1.8 2 : ¢2 ¡ f (x) = x ¡ 12 x2 ; h (x) = x ¡ 12 x2 Concavity and Convexity are Cardinal Properties We have seen that a very important property of a function f (x) is whether it is mountain-like or concave, or whether it is valley-like or convex. Now we might ask, is global concavity or convexity an ordinal or a cardinal property of f (x)? In other words if f (x) is globally concave (convex) does it follow that g (f (x)) is globally concave (convex) when g (x) is monotonic. Surprisingly the answer is no! Theorem 67 Concavity and Convexity are Cardinal properties. If f (x) is concave (convex) then it does not follow that a monotonic transformation: h (x) = g (f (x)) is concave (convex). Proof. Here we use proof by counter-example. Suppose for x > 0 that 1 f (x) = x 2 : f (x) is globally concave since: 1 3 f 00 (x) = ¡ x¡ 2 < 0: 4 Now suppose we let g (x) = x4 so that g 0 (x) = 4x3 > 0 and so g (x) is monotonic. Then: h (x) = g (f (x)) ³ 1 ´4 = x2 = x2 : But h (x) = x2 is globally convex (since h00 (x) = 2 > 0 ). Thus while f (x) is concave and a monotonic transformation h (x) is not concave (in fact it is convex). CHAPTER 2. UNIVARIATE CALCULUS 70 More generally note that if h (x) = g (f (x)) then using the chain rule: h0 (x) = g 0 (f (x)) f 0 (x) + ? + z }| {z }| { z }| { 2 00 0 0 =) h (x) = g (f (x))(f (x)) + g (f (x))f 00 (x) : 00 We cannot show that h00 (x) and f 00 (x) have the same sign because we do not know the sign of g 00 (f (x)) ; that is we only make assumptions about the …rst derivative of g (x) and not about the second derivative. This then is the basic reason why concavity and convexity are cardinal and not ordinal properties of a function. 2.5.4 Quasi-Concavity and Quasi-Convexity Since concavity and convexity are cardinal properties, let us de…ne a new kind of concavity and convexity, called quasi-concavity and quasi-convexity which are ordinal properties of a function: De…nition 68 Quasi-Concavity: A function f (x) is quasi-concave if and only if it is a monotonic transformation of a concave function; that is: f (x) = g (h (x)) where g 0 (x) > 0 for all x and h (x) is globally concave. De…nition 69 Quasi-Convexity: A function f (x) is quasi-convex if and only if it is a monotonic transformation of a convex function; that is: f (x) = g (h (x)) where g 0 (x) > 0 and h (x) is globally convex. If f (x) is convex (concave) then it is also quasi-convex (quasi-concave) since we can always let g (x) = x (with g0 (x) = 1 > 0) in which case f (x) = h (x) : Thus: Theorem 70 All convex functions are quasi-convex but not all quasi-convex functions are convex. Theorem 71 All concave functions are quasi-concave but not all quasi-concave functions are concave. If g (x) is monotonic in f (x) = g (h (x)) (2.1) then it follows that g (x) has an inverse function, g~ (x) If we apply g~ (x) to both sides of (2:1) we free h (x) from inside g (x) and obtain: h (x) = g~ (f (x)) : Thus if f (x) is quasi-concave (quasi-convex) there exists a monotonic transformation of f (x) which makes it concave (convex). We therefore have: CHAPTER 2. UNIVARIATE CALCULUS 71 Theorem 72 A function f (x) is quasi-concave (quasi-convex) if and only if there exists a monotonic transformation g~ (x) such that: h (x) = g~ (f (x)) is concave (convex). Remark: There are thus two methods for showing that a function f (x) is quasi-concave (quasi-convex). We can either 1) show that f (x) is a monotonic transformation of a concave (convex) function or 2) show that a monotonic transformation of f (x) is concave (convex): Example: Consider the function: f (x) = 1 : 1 + x2 This function is not globally concave since: ¢ ¡ 2 3x2 ¡ 1 00 f (x) = (1 + x2 )3 so that f (x) is convex or f 00 (x) > 0 for jxj > p13 : We can however show that f (x) is quasi-concave. Using the …rst method we have: f (x) = g (h (x)) 1 1 with monotonic transformation: g (x) = 1¡x (with g 0 (x) = (1¡x) 2 > 0) and 2 00 concave function and h (x) = ¡x (with h (x) = ¡2 < 0 ) since: f (x) = g (h (x)) = 1 1 1 = = : 2 1 ¡ h (x) 1 ¡ (¡x ) 1 + x2 Using the second method let g (x) = ¡ x1 be the monotonic transformation (since g 0 (x) = x12 > 0 ) that we apply to f (x) : We then obtain: h (x) = g (f (x)) = ¡ 1 1 1+x2 ¢ ¡ = ¡ 1 + x2 ¢ ¡ where h (x) = ¡ 1 + x2 is globally concave since h00 (x) = ¡2 < 0: 2.5.5 New Su¢cient Conditions for a Global Maximum or Minimum Suppose we have a function f (x) that is quasi-concave (quasi-convex) so that f (x) = g (h (x)) where h (x) is concave (convex). Suppose further that we have CHAPTER 2. UNIVARIATE CALCULUS 72 a solution to the …rst-order conditions f 0 (x¤ ) = 0: From the chain rule this implies that: f 0 (x¤ ) = g0 (h (x¤ )) h0 (x¤ ) = 0 =) h0 (x¤ ) = 0: Since x¤ is also a solution to the …rst-order conditions for h (x) and since h (x) is concave (convex) it follows that x¤ is a global maximum (minimum) for h (x) : Since a global maximum (minimum) is an ordinal property from Theorem 66, it follows that x¤ is a global maximum for f (x) as well! Thus we have the following su¢cient conditions for a global maximum (minimum): Theorem 73 If f (x) is quasi-concave and x¤ satis…es the …rst-order conditions: f 0 (x¤ ) = 0 then x¤ is the unique global maximum of f (x) : Theorem 74 If f (x) is quasi-convex and x¤ satis…es the …rst-order conditions: f 0 (x¤ ) = 0 then x¤ is the unique global minimum of f (x) : Remark: Since all concave (convex) functions are quasi-concave (quasi-convex) but not all quasi-concave (quasi-convex) functions are concave (convex), these su¢cient conditions for a global maximum (minimum) are more widely applicable than the earlier su¢cient conditions that relied on concavity (convexity). Example: We have seen that the 1 1 + x2 f (x) = is quasi-concave. From the …rst-order conditions we have: f 0 (x¤ ) = ¡ 2x¤ (1 + x¤2 )2 = 0 =) x¤ = 0: Since f (x) is quasi-concave we conclude that x¤ = 0 is a global maximum. This is illustrated in the plot below: 1 0.8 0.6 y 0.4 0.2 -4 -2 0 f (x) = 2 1 1+x2 x 4 : CHAPTER 2. UNIVARIATE CALCULUS 2.6 73 Exponential Functions and Logarithms Almost all of the functions we have considered so far involve terms of the form: xa for some value of a: For some a > 0 consider reversing a and x to obtain a new kind of function: De…nition 75 Exponential Function: An exponential function takes the form: f (x) = ax where a > 0 is referred to as the base. Example: If we reverse the 2 and the x in x2 we obtain: f (x) = 2x with f (3) = 23 = 8 and f (¡3) = 2¡3 = illustrated below: 1 8. The exponential f (x) = 2x is 30 25 20 y 15 10 5 -2 -1 0 1 2 x 3 4 5 : x f (x) = 2 Note from the graph that this function is non-negative, monotonic and convex. In mathematics, as in economics, it turns out that there is a best base a for exponentials ax . This is the number e denoted by: 1 1 1 + + + ¢ ¢ ¢ or 1!µ 2! ¶3! n 1 : e ´ lim 1 + n!1 n e ´ 1+ One can show that the two de…nitions are equivalent and lead to: e ¼ 2:718281828: CHAPTER 2. UNIVARIATE CALCULUS 74 Remark: The second de…nition of e has an economic interpretation in terms of compound interest. If you put $1 in the bank at r = 1 or 100% interest com1 pounded annually then after one year you will have (1 + r) = $2: Now suppose th interest is compounded every n1 of a year so that for n = 2; 3 and 365 (i.e., every 6 months, 4 months and daily interest) you would receive respectively: ³ r ´2 1+ 2 ³ r ´3 1+ 3 ³ r ´365 1+ 365 µ ¶2 1 = 1+ = 2:25; 2 µ ¶3 1 = 1+ = 2:3704; 3 = 2: 7146: Thus as interest is compounded more and more often the amount of money you receive converges on e dollars, or approximately $2:72; so that e is amount of interest you would receive from continuous compounding. We have: De…nition 76 The exponential function to the base e is denoted as: f (x) = ex or f (x) = exp (x) is de…ned as: ³ x ´n : ex ´ lim 1 + n!1 n Remark 1: If you get confused with ex when using say the chain rule, try rewriting the problem with ex replaced by: exp (x) and think of the letters: exp taking the place of f as in f (x) : Remark 2: It follows from the de…nition that er is the amount of money you would obtain from investing $1 at interest rate r when interest is compounded continuously. This is why in economics you will often see expressions like er for discounting and compound interest. For example one dollar at 10% interest or r = 0:1 compounded continuously will give you after 1 year: e0:1 = 1:1052: Mathematically the most important reason for choosing e as the base is that the derivative of f (x) = ex is also ex so that: Theorem 77 If f (x) = ex then f 0 (x) = ex : Proof. (Informal) From the de…nition we have that en (x) ! ex as n ! 1 where: ³ x ´n : en (x) = 1 + n CHAPTER 2. UNIVARIATE CALCULUS 75 To …nd the derivative of ex di¤erentiate en (x) and then let n ! 1 so that: ³ x ´n¡1 en (x) ¢: =¡ e0n (x) = 1 + n 1 + nx ¡ ¢ Now since limn!1 1 + nx = 1 we have: dex limn!1 en (x) ¡ ¢ = ex : = lim e0n (x) = n!1 dx limn!1 1 + nx Since ex > 0 it follows that f 0 (x) = ex > 0 and so the function f (x) is monotonic . Furthermore since f 00 (x) = ex > 0 it follows that ex is globally convex. These properties are illustrated in the plot below: 20 15 y10 5 -2 -1 0 1 x 2 3 : x f (x) = e Theorem 78 The function f (x) = ex has the following properties: 1. ex > 0 for all x 2. ex is de…ned for all x (it has an unrestricted domain) 3. ex is globally increasing (i.e., f 0 (x) = ex > 0 ) 4. ex is globally convex (i.e., f 00 (x) = ex > 0 ) 5. e0 = 1 6. ex ey = ex+y 7. ex ey = ex+y 8. e¡x = 1 ex : Since f (x) = ex is monotonic, it follows that it has an inverse function, which is the logarithm to the base e or ln (x) de…ned as: CHAPTER 2. UNIVARIATE CALCULUS 76 De…nition 79 The function ln (x) is the inverse function of ex so that: eln(x) = x; ln (ex ) = x: The function ln (x) is plotted below: 2 1 0 2 4 x 6 8 10 -1 -2 y -3 -4 : ln (x) Remark: Note from the graph that ln (x) is not de…ned for x · 0: We can use the fact that ln (x) is the inverse function of ex to prove that: Theorem 80 The derivative of ln (x) is d ln(x) dx = x1 : Proof. Since ln (x) is the inverse function of ex we have: eln(x) = x: Di¤erentiating both sides with respect to x and using the chain rule we have: d ln(x) e dx = d d x =) e|ln(x) {z } dx ln (x) = 1 dx =x d =) x ln (x) = 1 dx 1 d ln (x) = : =) dx x We also have: Theorem 81 The function ln (x) has the following properties: 1. ln (xy) = ln (x) + ln (y) 2. ln (xy ) = y ln (x) : 3. ln (x) is de…ned only for x > 0 (it has a restricted domain) CHAPTER 2. UNIVARIATE CALCULUS 77 4. ln (x) an take on both negative and positive values (it has an unrestricted range) 5. ln (x) is globally increasing 6. ln (x) is globally concave 7. ln (1) = 0 ¡ ¢ 8. ln x1 = ¡ ln (x) : Proof. The …rst follows from eln(x)+ln(y) = eln(x) eln(y)¡ = xy¢and then taking y the ln ( ) of both sides. The second follows from xy = eln(x) = ey ln(x) and then taking the ln ( ) of both sides. The third follows from ex > 0 for all x and so if x < 0 we would have the contradiction eln(x) = x < 0:2 The …fth follows ln(x) = x1 > 0 for x > 0. This sixth follows since d dx = ¡ x12 < 0. since: d ln(x) 2 dx 0 ln(0) = 1: The …nal result follows from: To¡ show ¢the seventh note ¡ ¢that e = e ln x £ x1 = ln (x) + ln x1 = ln (1) = 0: Remark: The function ln (x) gets used a lot in economics. For example in applied econometrics, rather than working directly with a price P one usually works with ln (P ) : On of the reasons for this is that ln (x) converts multiplication into addition, and converts powers into multiplication. Example 1: Suppose you had data on Q and P and wished to estimate a constant elasticity of demand curve Q = AP ¯ . Since this is a non-linear function you cannot directly apply linear regression to your data. However using the properties of the ln ( ) function we obtain: ¢ ¡ ¢ ¡ Q = AP b =) ln (Q) = ln AP b = ln (A) + ln P ¯ =) q = ® + ¯p where q = ln (Q) ; ® = ln (A) and p = ln (P ) : You now have a linear relationship between q and p which can be estimated by linear regression. Furthermore the coe¢cient on the regressor p is the elasticity of demand: ¯: Example 2: Consider the function y = f (x) = x3 e¡x CHAPTER 2. UNIVARIATE CALCULUS 78 for x ¸ 0 which is plotted below: 1.2 1 0.8 y 0.6 0.4 0.2 0 2 4 6 x 8 10 : 3 ¡x y=x e To …nd the maximum of this function take the …rst-order conditions (using the product and chain rules) to obtain: f 0 (x) = 3x2 e¡x ¡ x3 e¡x = x2 e¡x (3 ¡ x) ¤ =) f 0 (x¤ ) = x¤2 e¡x (3 ¡ x¤ ) = 0 =) (3 ¡ x¤ ) = 0 ¤ which has a solution x¤ = 3: Note that x¤2 > 0 since x > 0 and e¡x > 0 since ex > 0 for all x: Here we cannot show that x¤ = 3 is a global maximum by showing global concavity since f (x) is not globally concave. This follows since: ¡ ¢ f 00 (x) = xe¡x x2 ¡ 6x + 6 ³ ³ ³ p ´´ p ´´ ³ x¡ 3+ 3 = xe¡x x ¡ 3 ¡ 3 p ¢ p ¢ ¡ ¡ and so f (x) is concave in the interval 3 ¡ 3 < x < 3 + 3 and convex outside this interval. We can show however that f (x) is quasi-concave since: f (x) = g (h (x)) where: g (x) = ex is monotonic and: h (x) = 3 ln (x) ¡ x is globally concave since: h00 (x) = ¡ 3 < 0: x2 CHAPTER 2. UNIVARIATE CALCULUS 79 It follows that: x¤ = 3 is a global maximum for both h (x) and f (x) : Example 3: The standard normal distribution, easily the most important probability distribution, has a probability density given by: 1 2 1 p (x) = p e¡ 2 x 2¼ which is plotted below: 0.4 0.3 0.2 0.1 -4 -2 p (x) = 0 2 x 4 : 1 2 p1 e¡ 2 x 2¼ Note from the graph that p (x) appears to be symmetric around 0: This is in fact the case since: p (¡x) = p (x) as: 2 1 1 2 1 1 p (¡x) = p e¡ 2 (¡x) = p e¡ 2 x = p (x) : 2¼ 2¼ The mode of p (x) (the maximum value of p (x)) is at x¤ = 0 . To show this we use the chain rule to obtain the …rst-order conditions as: p0 (x) = 1 1 2 p e¡ 2 x £ ¡x 2¼ 1 1 ¤2 =) p0 (x¤ ) = 0 =) p e¡ 2 x £ ¡x¤ = 0 2¼ =) x¤ = 0: We might try to show that x¤ = 0 is a global maximum by showing that p (x) is globally concave. We have however that: 1 1 1 2 1 2 p00 (x) = ¡ p e¡ 2 x + p e¡ 2 x £ x2 2¼ 2¼ ¢ 1 1 2 ¡ = p e¡ 2 x x2 ¡ 1 2¼ from which it follows that p (x) is concave for ¡1 < x < 1 but convex for x > 1 or x < ¡1: It follows then that p (x) is not globally concave. CHAPTER 2. UNIVARIATE CALCULUS 80 We can however show that p (x) is quasi-concave since µ µ ¶ ¶ 1 1 2 p (x) = exp ln p ¡ x 2 2¼ with monotonic function: g (x) = exp (x) and µ ¶ 1 1 h (x) = ln p ¡ x2 2 2¼ h00 (x) = ¡1 < 0 so that h (x) is globally concave. It follow then that p (x) is quasi-concave so that x¤ = 0 is a global maximum. It will sometimes occur that you are confronted with exponential functions which do not have e as the base, and less frequently logarithms which are not to the base e; such as log10 (x) : Given such problems the best strategy is to convert the problem from base a to base e using: Theorem 82 The functions ax or loga (x) can be converted to base e using: 1. ax = eln(a)x 2. loga (x) = ln (x) = ln (a) : ¢x ¡ Proof. Since a = eln(x) we have: ax = eln(a) = eln(a)x : To derive the ¢loga (x) ¡ = eln(a) loga (x) and take the ln ( ) second result use: x = aloga (x) = eln(a) of both sides. Example 1: Given the function y = 2x we can convert to base e using It then follows that: ´x ³ 2x = eln(2) = eln(2)x : d x 2 = ln (2) eln(2)x = ln (2) 2x dx using the chain rule. Example 2: If you recall we never de…ned xa for a non-integer a: In fact it is de…ned using ex and ln (x) as: xa ´ ea ln(x) : Thus the reason xa is not de…ned for x < 0 is that ln (x) is not de…ned. Using this de…nition we can prove that: CHAPTER 2. UNIVARIATE CALCULUS 81 Theorem 83 For x > 0 and for any a we have: dxa = axa¡1 : dx Proof. Using the de…nition of xa and the chain rule we …nd that: d a x dx d a ln(x) e dx 1 = a ea ln(x) x = ax¡1 xa = axa¡1 : = Example 3: For the function: f (x) = xx we now have x in the base and the exponent! We have no direct rule for calculating derivatives of this function. We can however change from base x to base e as: ´x ³ f (x) = xx = eln(x) = ex ln(x) : Therefore using the chain and product rules yields: f 0 (x) = =) =) =) =) (1 + ln (x)) ex ln(x) ¤ ¤ f 0 (x¤ ) = 0 = (1 + ln (x¤ )) ex ln(x ) ¤ ¤ 1 + ln (x¤ ) = 0 since ex ln(x ) > 0 ln (x¤ ) = ¡1 x¤ = e¡1 = 0:36788: Furthermore f (x) is globally convex since: µ ¶ 1 f 00 (x) = ex ln x (1 + ln (x))2 + >0 x CHAPTER 2. UNIVARIATE CALCULUS 82 and hence x¤ = e¡1 is a global minimum. This is illustrated in the plot below: 1 0.95 0.9 0.85 0.8 0.75 0.7 0 0.2 0.4 x 0.6 0.8 1 f (x) = xx Example 4: On your calculator you will see the log10 (x) which is the logarithm to the base 10 instead of base e: To …nd its derivative we use: log10 (x) = 1 ln (x) ln (10) so that: 1 d log10 (x) = : dx ln (10) x 2.6.1 Exponential Growth and the Rule of 72 Eighty percent of rules of thumb only apply 20 percent of the time -David Gunn Suppose we replace x by t; and think of t as time, and imagine that y is some variable (population, GNP etc.) that grows with time so that y = f (t) : Theorem 84 The growth rate of y per unit period of t (e.g. growth per year) is: f 0 (t) f (t + ¢t) ¡ f (t) = : ¢t!0 f (t) ¢t f (t) lim Many economic variables appear to growth at approximately the same rate over time. For example since the industrial revolution many advanced economies have grown at an average of around 2% a year. There is a functional form which has the property that the growth rate remains constant over time. We have: CHAPTER 2. UNIVARIATE CALCULUS 83 Theorem 85 The function f (t) = Ae¹t grows at a constant rate ¹ for all t: Proof. Using the chain rule and the properties of ex we have: ¹Ae¹t f 0 (t) = = ¹: f (t) Ae¹t Example: Thus if t is measured in years and ¹ = 0:03; then y = Ae0:03t grows at 3% every year. One way of understanding the implications of di¤erent growth rates is the time it takes for y to double. Let this be ¢t which then satis…es: f (t + ¢t) = 2f (t) or Ae¹(t+¢t) = 2Ae¹t : Solving for ¢t we …nd that ln (2) :69315 72 = ¼ ¹ ¹ ¹ £ 100% which gives the rule of 72; where 72 is chosen because it is a nice number with lots of divisors that is not too far away from 69. Thus if GN P grows at 2% a year, it will double approximately every 72 2 or 36 years. On the other hand if GN P grows at 4% a year it will double every 72 4 or 18 years. This can make a huge di¤erence since the economy that doubles every 18 years will be 4 times as large after 36 years while the economy which grows at 2% will only be twice as large. Thus imagine two countries with identical GN P at t = 0 (say in 1945) but where one grows at 2%=year and the other grows at 4%=year: ¢t = 9 8 7 6 5 4 3 2 1 0 10 20 t 30 40 50 : Plot of e0:02t and e0:04t : Small di¤erences in growth rates make a huge di¤erence! As the above graph illustrates, after 55 years (the time from 1945 to today 2000) the country that grows at 4%=year will have an economy three times as large as the economy that grows at 2%=year: CHAPTER 2. UNIVARIATE CALCULUS 2.7 84 Taylor Series Although you may not be aware of it, calculus is really a method for approximating functions with polynomials. For example a derivative corresponds to the slope of a tangent line, this tangent line being a …rst degree polynomial. Second derivatives basically involve approximating a function with a second degree polynomial or quadratic. The key concept which links these polynomial approximations with derivatives is the Taylor Series. Theorem 86 Taylor Series: A function f (x) can be approximated at x = x0 by an nth order polynomial f~ (x), called a Taylor Series, given by: f 2 (x0 ) f n (x0 ) (x ¡ x0 )2 + ¢ ¢ ¢ + (x ¡ x0 )n f~ (x) = f (x0 ) + f 1 (x0 ) (x ¡ x0 ) + 2! n! where f n (x0 ) is the nth derivative of f (x) evaluated at x = x0 : Remark: The approximation of f~ (x) to f (x) gets better the closer x is to x0 : When x = x0 the approximation becomes exact, that is f~ (x0 ) = f (x0 ) since n the terms (x ¡ x0 ) become 0: Example 1: The …rst-order Taylor Series: f~ (x) = f (x0 ) + f 0 (x0 ) (x ¡ x0 ) approximates an arbitrary function f (x) by a line. Consider: f (x) = x2 + 9 and let us construct a …rst-order Taylor series around x0 = 1: We need to calculate two number f (x0 ) and f 0 (x0 ) as: f (x) = x2 + 9 =) f (x0 ) = f (1) = 12 + 9 = 10 f 0 (x) = 2x =) f 0 (x0 ) = f 0 (1) = 2 £ 1 = 2 so that: f~ (x) = f (1) + f 0 (1) (x ¡ 1) = 10 + 2(x ¡ 1) = 8 + 2x: To see how the approximation works, consider an x close to x0 = 1; say x = 1:2: Then we have: f (1:2) = (1:2)2 + 9 = 10:44 CHAPTER 2. UNIVARIATE CALCULUS 85 while the Taylor series approximation gives: f~ (1:2) ¼ 10 + 2(1:2 ¡ 1) = 10:4: On the other hand if x is far from x0 = 1; say x = 10 then: f (10) = 102 + 9 = 109 f~ (10) = 10 + 2 (10 ¡ 1) = 29 and f~ (x) does a poor job of approximating f (x) : A plot of f (x) = x2 + 9 and its straight-line Taylor series approximation f~ (x) = 10 + 2(x ¡ 1) is given below: 30 25 20 15 10 5 -4 -2 0 2 x 4 : f (x) and f~ (x) Example 2: The second-order Taylor Series is given by: f 00 (x0 ) (x ¡ x0 ) 2 f~ (x) = f (x0 ) + f 0 (x0 ) (x ¡ x0 ) + 2 which approximates the function f (x) at x0 by a quadratic. If f (x) = x3 + 9 then in order to calculate a second-order Taylor series around x0 = 1 we need to calculate three numbers: f (x0 ) ; f 0 (x0 ) and f 00 (x0 ) which are given by: f (x0 ) = 13 + 9 = 10 f 0 (x) = 3x2 =) f 0 (x0 ) = f 0 (1) = 3 £ 12 = 3 f 00 (x) = 6x =) f 00 (x0 ) = f 00 (1) = 6 £ 1 = 6 and so the second-order Taylor series is: f 00 (1) (x ¡ 1)2 f~ (x) = f (1) + f 0 (1) (x ¡ 1) + 2 6 = 10 + 3(x ¡ 1) + (x ¡ 1)2 2 3x2 ¡ 3x + 10: CHAPTER 2. UNIVARIATE CALCULUS 86 To see how the approximation does, let us …rst pick an x close to x0 = 1; say x = 1:2: We then have: f (1:2) = (1:2)3 + 9 = 10:728 while the Taylor Series approximation gives 6 f~ (1:2) = 10 + 3(1:2 ¡ 1) + (1:2 ¡ 1)2 = 10:72: 2 On the other hand if we choose an x far from x0 = 1; say x = 7; we obtain: f (7) = (7)3 + 9 = 352 6 f~ (1:2) = 10 + 3(7 ¡ 1) + (7 ¡ 1)2 = 136 2 and so we obtain a poor approximation. A plot of the cubic x3 + 9 and its quadratic second-order Taylor series approximation around x0 = 1 is given below. 70 60 50 40 30 20 10 -2 -1 0 1 2 x 3 4 f (x) and f~ (x) 2.7.1 The Error of the Taylor Series Approximation A natural question is to ask how well does f~ (x) approximate f (x)? The French mathematician Lagrange showed that the error of an nth order Taylor series ¹; where x ¹ lies approximation is equal to the n + 1th term with x0 replaced by x between x and x0 : Thus: Theorem 87 The error of the nth order Taylor series approximation is given by: f n+1 (¹ x) (x ¡ xo )n+1 (n + 1)! CHAPTER 2. UNIVARIATE CALCULUS 87 so that: f (x) = f (xo ) + f 0 (xo ) (x ¡ xo ) + ¢ ¢ ¢ + f n (x0 ) f n+1 (¹ x) (x ¡ xo )n+1 (x ¡ xo )n + n! (n + 1)! where x ¹ lies between x0 and x: Example: For the …rst-order Taylor we have: f (x) = f (xo ) + f 0 (xo ) (x ¡ xo ) + ¹ lies between x0 and x: where x f 00 (¹ x) (x ¡ xo )2 2! CHAPTER 2. UNIVARIATE CALCULUS 88 To see how this can be used, let us now prove that a concave (convex) function has a unique global maximum (minimum) at x¤ : Proof. Suppose that x¤ solves the …rst-order conditions: f 0 (x¤ ) = 0 and 00 f (x) < 0 for all x: A …rst-order Taylor series (with the error term) of f (x) around x0 = x¤ takes the form: =0 z }| { f 00 (¹ x) (x ¡ x¤ )2 f (x) = f (x¤ ) + f 0 (x¤ ) (x ¡ x¤ ) + 2! f 00 (¹ x) (x ¡ x¤ )2 : = f (x¤ ) + 2! Now since f (x) is concave (convex) it follows that: f 00 (x) < 0 for all x (f 00 (x) > x) < 0 (f 00 (¹ x) > 0): If x 6= x¤ it follows that 0 for all x) it follows that f 00 (¹ ¤ 2 (x ¡ x ) > 0 and hence we have: ¡ z }| { f 00 (¹ x) ¤ ¤ 2 (x ¡ x ) < f (x¤ ) : f (x) = f (x ) + 2! This says that for any x 6= x¤ that f (x) < f (x¤ ) so that x¤ is a global maximum (minimum). CHAPTER 2. UNIVARIATE CALCULUS 2.7.2 89 The Taylor Series for ex and ln (1 + x) Consider calculating a Taylor series for ex around x0 = 0. The nth term is of the form: xn f n (x0 ) (x ¡ x0 )n = n! n! since if f (x) = ex then f n (x) = ex and hence f n (0) = e0 = 1: It turns out that by letting n ! 1 we obtain an exact result, which is often used to de…ne ex as follows: Theorem 88 The in…nite order Taylor series for ex around x0 = 0 is exact for all x and is given by: ex = 1 + x + x2 x3 x4 + + +¢¢¢. 2! 3! 4! As an exercise take the derivative of both sides and show that f 0 (x) = ex : Another important result is the Taylor series for ln (1 + x): Theorem 89 The Taylor series of ln (1 + x) around x0 = 0: ln (1 + x) = x ¡ x2 x3 x4 + ¡ +¢¢¢ . 2 3 4 is exact for jxj < 1: Example: From the …rst-order Taylor for ln (1 + x) we have: ln (1 + x) ¼ x for x small. For example: ln (1 + 0:1) = :09531 ¼ 0:1: The Taylor series of ln (1 + x) can be used to de…ne an alternative measure of percentage that is very useful in economics. Suppose you want to calculate the percentage change from x1 to x2 : The normal way of doing this would be as: x2 ¡ x1 £ 100%: x1 Thus if x2 = 110 and x1 = 100 the percentage change so de…ned is 10%: All de…nitions of percentage su¤er from the fact that the choice of the base is to some extent arbitrary. Thus instead of using 100 as the base we could equally have well used 110; or indeed any number between 100 and 110 such as CHAPTER 2. UNIVARIATE CALCULUS 90 the midpoint 105. If for example we had used 110 as the base we would have as our de…nition of percentage: x2 ¡ x1 £ 100% x2 which would lead a percentage change of 9:0909%: Now consider calculating the de…nition of percentage as: (ln (x2 ) ¡ ln (x1 )) £ 100% or equivalently: ln µ x2 x1 ¶ £ 100%: Using this de…nition we would get 9:531%, which is intermediate between the two other de…nitions of percentage. To show why this third de…nition is sensible, use a …rst-order Taylor series approximation of ln (1 + x) ¼ x noting that: µ ¶ µ ¶ x2 x2 ¡ x1 x2 ¡ x1 : ln = ln 1 + ¼ x1 x1 x1 2.7.3 L’Hôpital’s Rule Consider the following problem. Suppose two functions f (x) and g (x) have the property that f (x0 ) = 0 and g (x0 ) = 0: We wish to …nd out what happens to f (x) 0 g(x) as x ! x0 : In general the ratio 0 is indeterminate so it is not clear what the limit is. Consider using a …rst-order Taylor series for f (x) and g (x) around x0 ; an approximation which will get better as: x ! x0 : Since f (x0 ) = g (x0 ) = 0 we have: f (x0 ) + f 0 (x0 ) (x ¡ x0 ) f 0 (x0 ) (x ¡ x0 ) f 0 (x0 ) f (x) ¼ = 0 = 0 : 0 g (x) g (x0 ) + g (x0 ) (x ¡ x0 ) g (x0 ) (x ¡ x0 ) g (x0 ) This yields L’Hôpital’s rule: Theorem 90 L’Hôpital’s Rule I If f (x0 ) = g (x0 ) = 0 then: lim x!x0 f (x) f 0 (x) = lim 0 : g (x) x!x0 g (x) Another version of L’Hôpital’s rule is: Theorem 91 L’Hôpital’s Rule II If f (x0 ) = g (x0 ) = 1 then: lim x!x0 f (x) f 0 (x) = lim 0 : g (x) x!x0 g (x) CHAPTER 2. UNIVARIATE CALCULUS 91 Remark: L’Hôpital’s rule does not work if either f (x) or g (x) does not approach 0 or 1: Example: Suppose f (x) = x2 ¡ 1 and g (x) = x ¡ 1 so that: f (1) = g (1) = 0: Since f 0 (x) = 2x and g0 (x) = 1 we have: x2 ¡ 1 2x = lim = 2: x!1 x ¡ 1 x!1 1 lim 2.7.4 Newton’s Method Consider the problem of …nding a root of a function f (x) ; that is we wish to calculate an x+ which satis…es ¡ ¢ f x+ = 0: This is a very common problem in economics and econometrics. For example suppose we wish to minimize or maximize a function g (x) : Then we would want to calculate the root of g 0 (x) ; that is the x+ = x¤ which satis…es the …rst-order conditions: g 0 (x¤ ) = 0: Solving for roots is easy to do for linear functions, quadratics and certain other special functions. Generally speaking functions for which a formula exists for calculating a root are the exception. Although there exists no formula, there do exist numerical methods for calculating x+ . These numerical methods, combined with the use of computers, make the solving of these sorts of problems routine today. The basic method was invented by Newton and involves approximating f (x) with a …rst-order Taylor series. The …rst step is to make an educated guess what the root x+ might be. Call this guess x0 and approximate f (x) around x0 using a …rst-order Taylor series so that: f (x) ¼ f~ (x) = f (x0 ) + f 0 (x0 ) (x ¡ x0 ) : Although we cannot solve f (x+ ) = 0; it is easy to solve f~ (x) = 0 since f~ (x) is just a linear function. Let x1 be the value of x that solves f~ (x) = 0 so that: f~ (x1 ) 0 =) f (x0 ) + f 0 (x0 ) (x1 ¡ x0 ) = 0 f (x0 ) =) x1 = x0 ¡ 0 f (x0 ) = While x1 is not a root of f (x) it will generally be closer to x+ than x0 : To get an even better estimate of x we apply the same method again but now using x1 that is we use: f (x) ¼ f~ (x) = f (x1 ) + f 0 (x1 ) (x ¡ x1 ) CHAPTER 2. UNIVARIATE CALCULUS 92 so that solving for f~ (x2 ) = 0 we obtain: x2 = x1 ¡ f (x1 ) : f 0 (x1 ) The new guess x2 will generally be closer to x than x1 : This procedure can be repeated again and again using: xn = xn¡1 ¡ f (xn¡1 ) f 0 (xn¡1 ) until xn close enough to x : This is Newton’s method which is represented graphically below. CHAPTER 2. UNIVARIATE CALCULUS 93 Example: Suppose you wish to …nd a root of: f (x) = x7 ¡ 3x + 1 so that x satis…es x7 ¡ 3x + 1 = 0: To apply Newton’s method we will need : f 0 (x) = 7x6 ¡ 3: Let x 0 = 1 be our initial guess. Then since f (1) = 17 ¡ 3 £ 1 + 1 = ¡1; f 0 (1) = 7 £ 16 ¡ 3 = 4 we have: f~ (x) = ¡1 + 4 (x ¡ 1) so that: f~ (x1 ) = 0 =) ¡1 + 4 (x1 ¡ 1) = 0 =) x1 = 1:25: Thus x1 = 1:25 is our new guess of what x is. Repeating the procedure with x1 = 1:25 we …nd that f (1:25) = 2:018 4; f 0 (1:25) = 23: 703 so that: f~ (x) = 2: 018 4 + 23:703 (x ¡ 1:25) f~ (x2 ) = 0 =) x2 = 1:1648: Repeating this again with x 2 = 1:1648 we …nd that: f~ (x) = f (1:1648) + f 0 (1:1648) (x ¡ 1:1648) f~ (x3 ) = 0 =) x3 = 1:1362: To obtain more precision we iterate one more time with x3 = 1:1362 so that: f~ (x) = 0:033453 + 12:06 (x ¡ 1:1362) ~ f (x4 ) = 0 =) x4 = 1:1334: The actual root, to 5 decimal places is x = 1:1332 so our solution x 4 = 1:1334 is o¤ by 0:0002 . For many purposes this is close enough although further accuracy can be obtained by iterating further. CHAPTER 2. UNIVARIATE CALCULUS 94 You might want to try and …nd the other two real roots x = ¡1:2492 and x = 0:33349 which can be found by using other starting values (try the starting values x¤0 = 1:5 and x¤0 = 0 respectively). You can see all three roots graphically below: 10 y 5 -1.5 -1 -0.5 0 0.5 x 1 1.5 -5 -10 : f (x) = x7 ¡ 3x + 1 2.8 Technical Issues 2.8.1 Continuity and Di¤erentiability Nothing is accomplished all at once, and it is one of my great maxims ... that nature makes no leaps.... This law of continuity declares that we pass from the small to the great - and the reverse - through the medium, in degree as well as in parts. -Leibniz Natura non facit saltum (nature does not make a jump). -On the title page of Alfred Marshall’s 1890 Principles of Economics Not all functions are continuous, and not all functions have a derivative. These functions can often be ignored for the purposes of intermediate economics as mathematical freaks. Nevertheless there is some virtue in having in the back of your mind the idea that sometimes continuity and di¤erentiability are issues. Example 1: The function: f (x) = ½ x2 if x ¸ 2 ¡5x if x < 2 CHAPTER 2. UNIVARIATE CALCULUS 95 plotted below: 25 20 15 y 10 5 -4 0 -2 2 4 x -5 -10 A Discontinuous Function is not continuous at x = 2 and so does not have a derivative or a slope at x = 1: Thus at x = 2 we really cannot say if f (x) is increasing or decreasing. Example 2: In order to have a derivative a function must be continuous, but there are continuous functions that do not have derivatives. For example the function: f (x) = ¡ jx ¡ 1j1=2 which is plotted below: -0.5 -1 y -1.5 -2 -2.5 -4 -2 0 2 4 x 6 8 f (x) = ¡ jx ¡ 1j1=2 is continuous but does not have a derivative at x = 1. In particular f 0 (x) = ¡ p1 for x < 1 and f 0 (x) = p 1 for x > 1 and f 0 (x) ! §1: as x ! 1 2 jx¡1j 2 jx¡1j from either the left or the right. The problem is that the function has a kink at x = 1; and so its derivative does not exist at this point. CHAPTER 2. UNIVARIATE CALCULUS 2.8.2 96 Corner Solutions In a more advanced treatment of the …rst-order conditions we would have stated that if x¤ is a maximum and x¤ does not lie on the boundary of the domain of f (x) then: f 0 (x¤ ) = 0: If x¤ lies on the boundary of the domain we have a corner solution and it is not necessarily the case that f 0 (x¤ ) = 0: To see why this might matter consider the fact that in economics prices and quantities are positive we often require that the domain of f (x) have x ¸ 0: It follows then that 0 is on the boundary of the such a domain. Sometimes it occurs where x¤ = 0 as when a …rm decides not to hire any labour or when a household does not buy any of the good. In this case the …rst-order conditions no longer require f 0 (x¤ ) = 0: Example: Consider the problem of maximizing f (x) = 10 ¡ (x + 1)2 where we restrict the domain of f (x) to be x ¸ 0: If we use the …rst-order conditions we …nd that: f 0 (x) = ¡2 (x + 1) =) f 0 (x¤ ) = 0 =) x¤ = ¡1: Note however that x¤ = ¡1 is not in the domain of f (x) since we require: x ¸ 0: So what value of f (x) maximizes f (x) for x ¸ 0? Consider the plot of f (x) below: 9 8 7 y 6 5 4 0 0.2 0.4 0.6 0.8 x 1 1.2 1.4 : f (x) = 10 ¡ (x + 1)2 Note that f (x) is maximized at x¤ = 0; that is we have a maximum on the boundary of the domain of f (x) : From the graph or from: f 0 (0) = ¡2 (0 + 1) = ¡2 < 0: we see that at x¤ = 0 that f 0 (x) < 0. A more systematic treatment of the …rst-order conditions at corner solutions leads to the Kuhn-Tucker conditions which are better left for more advanced study. CHAPTER 2. UNIVARIATE CALCULUS 2.8.3 97 Advanced Concavity and Convexity If you go on and do more advanced work you will …nd that our de…nitions of concavity and convexity are not entirely adequate for all purposes. Consider for example the function: f (x) = x4 which is plotted below: 1 0.8 0.6 y 0.4 0.2 -1 0 -0.5 0.5 x 1 : 4 y=x From the plot f (x) certainly looks valley-like for all x and hence we would like to say that x4 is globally convex. If we check the second derivative we have f 00 (x) = 12x2 ¸ 0 but for x = 0 we have f 00 (0) = 0: Thus according to our de…nition of convexity x4 is not convex since we require that f 00 (x) > 0 for all x: Another problem arises with the absolute value function y = jxj which is plotted below: 5 4 3 y 2 1 -4 -2 0 y = jxj 2 x 4 : CHAPTER 2. UNIVARIATE CALCULUS 98 Again this function clearly looks valley-like and hence we would like to say that it is convex. However f 00 (x) = 0 for x 6= 0 and f 00 (0) is not de…ned. These problems can be dealt with by using more sophisticated de…nitions of concavity and convexity. In particular we have: De…nition 92 A function f (x) is convex if and only if it is the case for all x1 and x2 in the domain of f (x) that for all 0 · ¸ · 1: x3 = ¸x1 + (1 ¡ ¸) x2 is in the domain of f (x) and f (x3 ) · ¸f (x1 ) + (1 ¡ ¸) f (x2 ) : De…nition 93 A function f (x) is concave if and only if it is the case for all x1 and x2 in the domain of f (x) that for all 0 · ¸ · 1: x3 = ¸x1 + (1 ¡ ¸) x2 is in the domain of f (x) and f (x3 ) ¸ ¸f (x1 ) + (1 ¡ ¸) f (x2 ) : Remark 1: The de…nition says that if you draw a line between any two points on the graph of a convex function f (x) then the line falls everywhere above the graph of f (x) ; while if you draw a line between two points of a concave function the line falls below the graph. This is illustrated in the diagram below: CHAPTER 2. UNIVARIATE CALCULUS 99 Remark 2: Note that both f (x) = jxj and f (x) = x4 are convex according to the more advanced de…nition. Finally in more advanced work one makes the distinction between strictly convex (concave) functions which have no linear segments, and convex (concave) functions which are allowed to have linear segments. Thus f (x) = jxj is convex but not strictly convex because it is linear to the right and left of 0 while f (x) = x2 ; which has no linear segments, is strictly convex. In particular: De…nition 94 A function f (x) is strictly convex if and only if it is the case for all x1 and x2 in the domain of f (x) such x1 6= x2 that for all 0 < ¸ < 1: x3 = ¸x1 + (1 ¡ ¸) x2 is in the domain of f (x) and f (x3 ) < ¸f (x1 ) + (1 ¡ ¸) f (x2 ) : CHAPTER 2. UNIVARIATE CALCULUS 100 De…nition 95 A function f (x) is strictly concave if and only if it is the case for all x1 and x2 in the domain of f (x) such x1 6= x2 that for all 0 < ¸ < 1 : x3 = ¸x1 + (1 ¡ ¸) x2 is in the domain of f (x) and f (x3 ) < ¸f (x1 ) + (1 ¡ ¸) f (x2 ) : Note the more severe requirements in these de…nitions that x1 6= x2 and that ¸ = 0 and ¸ = 1 are not allowed. This means that all strictly convex (concave) functions are also convex (concave) but a convex (concave) function is not necessarily strictly convex (concave). It also turns out that our de…nitions of quasi-concavity and quasi-convexity are ‡awed. For completeness you might also want to see the advanced de…nitions: De…nition 96 A function f (x) is quasi-concave if and only if for all x1 ; x2 in the domain of f (x) that for 0 · ¸ · 1: x3 = ¸x1 + (1 ¡ ¸) x2 is also in the domain of f (x) and: f (x3 ) ¸ min [f (x1 ) ; f (x2 )] : De…nition 97 A function f (x) is quasi-concave if and only if for all x1 ; x2 in the domain of f (x) that for 0 · ¸ · 1: x3 = ¸x1 + (1 ¡ ¸) x2 is also in the domain of f (x) and: f (x3 ) · max [f (x1 ) ; f (x2 )] : Chapter 3 Matrix Algebra Such is the advantage of a well constructed language that its simpli…ed notation often becomes the source of profound theories. -PierreSimon Laplace A matrix is basically just a table of given by: 2 50 A = 4 35 65 numbers. For example the matrix A 3 75 65 5 85 could be the grades of three students over two exams. We implicitly work with matrices all the time when we work with data. Matrix algebra is the art of manipulating matrices in a manner similar to manipulating ordinary numbers in ordinary algebra. Thus we will learn to add subtract, multiply and divide matrices. It is even possible to calculate eA or ln (A) where A is a matrix. In many ways matrix algebra is nothing more than a convenient notation. It is always possible in principle to avoid matrix algebra by working directly with the ordinary numbers inside the matrices. This is however rather like walking from Los Angeles to New York rather than ‡ying! There are for example derivations in econometrics that might require …ve pages without matrix algebra but which can be performed in only a few lines using matrix algebra. Matrix algebra is a profound notation, one that allows you to see things that you would never see otherwise. Along with calculus, it is one of the two fundamental mathematical skills that a student of economics must acquire. The cost of the power of matrix algebra is danger! Many of your instincts from ordinary algebra will lead you astray when you work with matrices. The classic example of this is that for matrices A £ B and B £ A are no longer the 101 CHAPTER 3. MATRIX ALGEBRA 102 same! For this reason you need to be careful in the beginning (and even later on!) until you have developed reliable instincts. We begin by de…ning a matrix: De…nition 98 Matrix: An m £ n matrix A with m rows and n columns takes the form: 2 3 a11 a12 ¢ ¢ ¢ a1n 6 a21 a22 ¢ ¢ ¢ a2n 7 6 7 A = [aij ] = 6 . .. .. 7 .. 4 .. . . . 5 am1 am2 ¢ ¢ ¢ amn where aij is the element in the ith row and the j th column of A: Example: The 3 £ 2 matrix: 2 a11 A = 4 a21 a31 3 2 3 5 4 a12 a22 5 = 4 3 1 5 a32 6 2 has a12 = 4; a21 = 3 and a32 = 2: The case of square matrices and their diagonal elements will be of particular importance: De…nition 99 Square Matrix An m£n matrix A is a square matrix if m = n: De…nition 100 The Diagonal of a Square Matrix: Given an n £ n square matrix A = [aij ] the diagonal elements are those elements aij for which i = j: Example: A 2 £ 2 square matrix is: · ¸ 5 4 : 3 1 The diagonal elements are a11 = 5 and a22 = 1. Remark: Note that the diagonal goes from the top left-hand corner to the bottom right-hand corner as: 3 2 .. . 5: 4 .. . For our purposes there is nothing particularly interesting about the ‘other diagonal’, the one that goes from the top right-hand corner to the bottom left-hand corner. Also of special importance in matrix algebra are vectors, which come in two ‡avors, and scalars: CHAPTER 3. MATRIX ALGEBRA 103 De…nition 101 Row Vector: A row vector x = [xi ] is a 1 £ n matrix. De…nition 102 Column Vector: A column vector x = [xi ] is a n £ 1 matrix. De…nition 103 Scalar: A scalar is a 1£1 matrix or just an ordinary number. Example: Below x is a 3 £ 1 column vector, y is a 1 £ 3 row vector and z is a 1 £ 1 scalar: 2 3 1 £ ¤ x = 4 2 5 ; y = 5 4 2 ; z = 3: 3 Remark: Any m £ n matrix A can be usefully thought of as consisting of n column vectors of dimension m £ 1 or m row vectors of dimension 1 £ n: Example: The 3 £ 2 matrix: 2 3 2 2 3 2 3 3 2 £ 5 4 ¤ 3 6 5 4 5 4 6 £ ¤ A=4 3 1 5=4 4 3 5 4 1 5 5=6 6 3 1 4 6 2 6 2 £ ¤ 6 2 is made up of two column vectors: 2 3 2 3 5 4 4 3 5 and 4 1 5 6 2 7 7 7 7 5 or three row vectors: £ 3.1 5 4 ¤ £ ¤ £ ¤ ; 3 1 and 6 2 : Matrix Addition and Subtraction Your instincts from ordinary algebra are probably quite reliable for the addition and subtraction of matrices. The rules are very simple. If A and B are both m £ n matrices (of the same order) then: De…nition 104 If A = [aij ] and B = [bij ] are both m £ n matrices and C = A + B; then C is an m £ n matrix with: C = [aij + bij ] : De…nition 105 If A = [aij ] and B = [bij ] are both m £ n matrices and C = A ¡ B; then C is an m £ n matrix with: C = [aij ¡ bij ] : CHAPTER 3. MATRIX ALGEBRA 104 Remark: The only way you are likely to wrong here is if you try and add or subtract two matrices of a di¤erent order. Example 1: 2 3 2 3 4 4 ¡2 1 5 + 4 6 2 2 3 2 3 4 4 ¡2 1 5 ¡ 4 6 2 3 2 3 5 3 8 7 8 3 5 = 4 6 4 5 9 1 15 3 3 2 3 5 3 ¡2 1 8 3 5 = 4 ¡10 ¡2 5 : 9 1 ¡3 1 Example 2: The sum: · 3 4 ¡2 1 ¸ 2 3 5 3 +4 8 3 5 9 1 is not de…ned since the two matrices are not of the same order. It is also possible to multiply any matrix by any scalar. Again the rule is very simple: De…nition 106 If C = ®A where ® is a scalar and A = [aij ] is an m £ n matrix then C is an m £ n matrix with C = [®aij ] : Example: 2 3 2 3 2 3 3 4 6£3 6£4 18 24 6 4 ¡2 1 5 = 4 6 £ ¡2 6 £ 1 5 = 4 ¡12 6 5 : 6 2 6£6 6£2 36 12 3.1.1 The Matrix 0 We will often come across matrices which have all elements equal to zero: We have: De…nition 107 In matrix algebra when we write A = 0 we mean that all elements of A are equal to 0: De…nition 108 In matrix algebra when we write A 6= 0 we mean that A is not the 0 matrix; that is there exists at least one element of A which is not zero. Example 1: Given the 3 £ 2 matrix A; 2 3 3 4 4 ¡2 1 5 6 2 CHAPTER 3. MATRIX ALGEBRA 105 if we subtract it from itself we obtain: A¡A =0 or: 2 3 2 3 2 3 3 4 3 4 0 0 4 ¡2 1 5 ¡ 4 ¡2 1 5 = 4 0 0 5 : 6 2 6 2 0 0 Note that here 0 is not the ordinary number 0 but a 3 £ 2 matrix of 00 s; that is: 2 3 0 0 0 = 4 0 0 5: 0 0 Thus if we were just to write A ¡ A = 0 under the assumption that A is 3 £ 2; it is left implicit that the dimension of 0 matrix is also 3 £ 2: Example 2: If 2 3 0 0 A=4 0 0 5 5 0 then we can legitimately write A 6= 0 since a32 6= 0: 3.2 Matrix Multiplication Unlike addition and subtraction, matrix multiplication is tricky and your instincts from ordinary algebra are likely unreliable. We begin with the simplest case where we multiply a row and a column vector. We have: De…nition 109 Let a = [ai ] be a 1 £ n row vector and let b = [bi ] be a n £ 1 column vector. Then the product ab is a scalar given by: 2 3 b1 6 b2 7 7 £ ¤6 6 7 ab ´ a1 a2 a3 ¢ ¢ ¢ an 6 b3 7 ´ a1 b1 + a2 b2 + ¢ ¢ ¢ + an bn : (3.1) 6 .. 7 4 . 5 bn Example: Given: a= £ 1 3 6 ¤ 2 3 2 and b = 4 4 5 ; 7 CHAPTER 3. MATRIX ALGEBRA 106 then the product of these two matrices is: 2 3 £ ¤ 2 ab = 1 3 6 4 4 5 = 1 £ 2 + 3 £ 4 + 6 £ 7 = 56: 7 Remark: Here the order is important; that is ab and ba are not equal! In the example above while ab is the scalar 56, we shall see that ba is in fact a 3 £ 3 matrix given by: 2 3 2 3 2 £ 2 6 12 ¤ ba = 4 4 5 1 3 6 = 4 4 12 24 5 : 7 7 21 42 Now consider calculating AB where A and B are not vectors. Part of the trick is to think of A as a collection of row vectors and B as a collection of column vectors. The elements of AB are then found by multiplying the row vectors of A with the column vectors of B in the manner we have just learned. De…nition 110 If A is an m £ n matrix and B is an n £ s matrix then to obtain AB write A as a collection of m row vectors and B as a collection of s column vectors as: 3 2 a1 6 a2 7 £ ¤ 7 6 A = 6 . 7 ; B = b1 b2 : : : bs . 4 . 5 am where the 1 £ n row vector ai is the ith row of A and the n £ 1 column vector bj is the j th column of B. The product C = AB is then an m £ s matrix de…ned as: 3 2 a1 b1 a1 b2 ¢ ¢ ¢ a1 bs 6 a2 b1 a2 b2 ¢ ¢ ¢ a2 bs 7 7 6 C´6 . .. .. 7 : .. 4 .. . . . 5 am b1 am b2 ¢¢¢ am bs In order for the product AB to be de…ned the number of columns in A must equal the number of rows in B: A recipe for determining if AB is de…ned, and then computing AB is found below: Recipe for Matrix Multiplication CHAPTER 3. MATRIX ALGEBRA 107 Given two matrices: A which is m £ n and B which is r £ s; write the dimensions of the matrices in the order you wish to multiply them. Thus for AB we would write m £ njr £ s: We have: 1. The product AB is de…ned if and only if the two numbers in the middle are the same; that is if n = r or m £ njr £ s: |{z} n=r 2. If 1. is satis…ed so that AB is de…ned, then the dimension of AB is found by eliminating the two identical numbers n and r in the middle as: m£ njr £ s =) m £ s |{z} eliminate so that AB is an m £ s matrix. 3. Write A as a collection of m row vectors and B as a collection of s column vectors. The i; j th element of C = AB = [cij ] is then found by multiplying the ith row vector in A with the j th column vector in B so that cij = ai bj : Example 1: Consider calculating AB for the two matrices: 2 3 2 3 3 4 6 7 A = 4 ¡2 1 5 and B = 4 8 4 5 : 6 2 6 3 Following the recipe we have: 1. Writing out the dimensions of AB as: 3 £ 2j3 £ 2 we see that the two inside numbers do not match (i.e., 2 6= 3 ) and so the product AB is not de…ned. There is therefore no AB to calculate! Example 2: Consider: 2 3 · ¸ 3 4 5 2 1 A = 4 2 1 5 and B = : 3 3 4 6 2 Following the recipe we have: CHAPTER 3. MATRIX ALGEBRA 108 1. Writing out the dimensions of AB as: 3 £ 2j2 £ 3 we see that the two middle numbers match and so the product AB is de…ned. 2. Deleting the two middle numbers we …nd that: 3 £ 2j2 £ 3 =) 3 £ 3 so that AB is a 3 £ 3 matrix. 3. To calculate AB we write A as a collection of 3 row vectors and B as a collection of 3 column vectors as: ¤ 3 2 £ 3 4 6 7 · · ¸ · ¸ · ¸ ¸ 6 £ ¤ 7 5 2 1 6 7 : A = 6 2 1 7; B = 3 3 4 5 4 £ ¤ 6 2 Carrying out the 2 £ 3 6 6 £ 6 2 AB = 6 6 6 £ 4 6 2 £ multiplication we …nd that: ¤ 3 4 7 ¤ 7· · ¸ · ¸ · ¸ ¸ 1 7 5 2 1 7 7 3 3 4 ¤ 7 2 5 ¤ · 5 3 ¸ 3 4 6 6 6 6 · ¸ 6 £ ¤ 5 6 2 1 = 6 3 6 6 6 · ¸ 4 £ ¤ 5 6 2 3 2 3 27 18 19 = 4 13 7 6 5 : 36 18 14 £ £ 3 4 2 1 £ ¤ ¤ 6 2 · · ¤ 2 3 2 3 · ¸ ¸ 2 3 ¸ £ 3 4 ¤ · 1 4 ¸ 3 7 7 7 · ¸ 7 7 £ ¤ 1 7 2 1 7 4 7 7 · ¸ 7 £ ¤ 1 5 6 2 4 For example to …nd the 2; 3 element of AB we multiply · ¸ £ ¤ 1 2 1 =2£1+1£4=6 4 while to obtain the 1; 1 element we multiply: · ¸ £ ¤ 5 3 4 = 3 £ 5 + 4 £ 3 = 27: 3 You should repeat the calculation of the remaining elements on your own. CHAPTER 3. MATRIX ALGEBRA 109 Example 3: Consider reversing the order of the multiplication in the previous example and calculating BA; where A and B are given above. Following the recipe we have: 1. Since B is 2 £ 3 and A is 3 £ 2 we have 2 £ 3j3 £ 2 so that the two inside numbers match and BA is de…ned. 2. Eliminating the two inside numbers we …nd that 2 £ 3j3 £ 2 =) 2 £ 2 so the resulting matrix will be a 2 £ 2 matrix. 3. Writing B as a collection of 2 row vectors and A as a collection of 2 column vectors as: ¤ 3 2 £ 2 2 3 2 3 3 5 2 1 3 4 4 5 4 4 5 4 2 1 5 5 B= £ ¤ ; A= 3 3 4 6 2 we have: 2 £ BA = 4 2 6 6 6 6 = 6 6 6 6 4 = · 3 2 3 3 3 4 54 4 2 5 4 1 5 5 £ ¤ 3 3 4 6 2 2 3 2 3 3 £ ¤ 3 £ ¤ 4 5 2 1 4 2 5 5 2 1 4 1 5 7 7 7 6 2 7 7 2 3 2 3 7 7 7 £ ¤ 3 £ ¤ 4 5 4 5 4 5 3 3 4 2 3 3 4 1 6 2 ¸ 25 24 : 39 23 5 2 1 ¤ 32 2 Note that BA is 2£2 while AB is 3£3: This illustrates the important fact that even when the product exists: AB 6= BA: Note that neither AA nor BB is de…ned. 3.2.1 The Identity Matrix The identify matrix I in matrix algebra plays the same role as 1 in ordinary algebra. De…nition 111 Identity Matrix: The identity matrix I is an n £ n square matrix with ones along the diagonal and zeros on the o¤-diagonal. CHAPTER 3. MATRIX ALGEBRA 110 Note that just as the number 1 has the property: 1£5=5£1 =5 the identity matrix has the same property for matrices; that is: Theorem 112 For all matrices: IA = AI = A: Example: The 3 £ 3 identity matrix is 2 1 I=4 0 0 given by: 3 0 0 1 0 5 0 1 and you can verify that: 2 32 3 2 3 27 18 19 1 0 0 27 18 19 4 13 7 6 5 4 0 1 0 5 = 4 13 7 6 5 : 36 18 14 0 0 1 36 18 14 3.3 The Transpose of a Matrix It is very common in matrix algebra to reverse the rows and columns of a matrix which results in the transpose of a matrix: De…nition 113 Transpose: If A = [aij ] is an m£n matrix then the transpose of A; denoted by: AT is an n £ m matrix where the i; j th element is aji or: AT ´ [aji ] : Remark: A seemingly trivial but remarkably useful fact is that the transpose of a scalar is itself. For example: 5T = 5: Example: 2 3T 3 4 4 ¡2 1 5 6 2 · ¸T 5 2 1 ¡3 3 4 · 3 ¡2 6 4 1 2 2 3 5 ¡3 3 5: = 4 2 1 4 = ¸ and Note that in the …rst case the transpose causes a 3 £ 2 matrix to become a 2 £ 3 matrix while in the second it causes a 2 £ 3 matrix to become a 3 £ 2 matrix. We have: CHAPTER 3. MATRIX ALGEBRA 111 Theorem 114 Transposes satisfy: 1. If AB is de…ned then (AB)T = B T AT ¡ ¢T 2. AT = A T 3. (A + B) = AT + B T Remark: The …rst of these results is the trickiest. The key step in …nding (AB)T is that you must …rst reverse the order of multiplication before applying T to A and B: 3.3.1 Symmetric Matrices Recall that for square matrices the diagonal goes from the top left-hand corner to the bottom right-hand corner. For example with: 2 3 1 2 5 A=4 2 3 6 5 5 6 4 the diagonal consists of the elements 1; 3 and 4: The o¤-diagonal elements are then those elements above and below the diagonal. Notice that in this example the elements above the diagonal mirror the elements below the diagonal. We call such matrices symmetric matrices. The precise de…nition of a symmetric matrix is: De…nition 115 Symmetric Matrix: A matrix A is symmetric if and only if A = AT : Remark: Only square matrices can be symmetric since if A is m £ n then AT is n £ m and so A = AT implies that m = n: Example: The matrix A 2 1 2 A=4 2 3 5 6 However the matrices 2 1 B = 4 2 7 2 1 C = 4 2 5 above is symmetric since: A = AT 3 2 3T 2 1 2 5 1 2 5 6 5=4 2 3 6 5 =4 2 3 5 6 4 5 6 4 or 3 5 6 5: 4 B and C below are not symmetric since: 2 3T 2 3 3 1 2 5 1 2 7 2 5 3 6 5 6= B T = 4 2 3 6 5 = 4 2 3 6 5 7 6 4 5 6 4 6 4 2 3T 3 · ¸ 1 2 2 1 2 5 T 4 5 5 3 6= C = 2 3 = : 2 3 6 5 6 6 CHAPTER 3. MATRIX ALGEBRA 3.3.2 112 Proof that AT A is Symmetric In general it is not possible to take powers of matrices. Thus if A is an m £ n matrix then the square of A or A2 = AA is not de…ned unless n = m. Squares only exist for square matrices. However we can always square A as: AT A which is an n £ n matrix or as AAT which is an m £ m matrix. Matrices such as a AT A and AAT turn out to be important in econometrics. These matrices are always symmetric! Thus: Theorem 116 The matrices: AT A and AAT are symmetric. Proof. In general to prove symmetry we begin with C T and try and show that it is equal to C: Thus if C = AT A then: CT ¡ T ¢T A A (de…nition) ¡ ¢ T T = AT AT (since (DE) = E T DT ) ¡ ¢T = AT A (since DT = D) = C (de…nition) =) C is symmetric. = You can show that AAT is symmetric in the same way or use the above result since if D = AAT then D = B T B where B = AT and so D has the form AT A and hence is symmetric. Example: Given: 2 3 3 4 A = 4 ¡2 1 5 6 2 we have AT A and AAT are symmetric as predicted by the Theorem since: 2 3 · ¸ · ¸ 3 4 3 ¡2 6 4 49 22 ¡2 1 5 = AT A = 4 1 2 22 21 6 2 and 2 3 2 3 ¸ 3 4 · 25 ¡2 26 3 ¡2 6 5 ¡10 5 : AAT = 4 ¡2 1 5 = 4 ¡2 4 1 2 6 2 26 ¡10 40 3.4 The Inverse of a Matrix Just as with ordinary numbers we will want to divide with matrices. With ordinary numbers we can express division a ¥ b using multiplication and inverse CHAPTER 3. MATRIX ALGEBRA 113 as: a ¥ b ´ a £ b¡1 . Now replacing a and b with two matrices A and B; we already know how to multiply them as A £ B; so if we can …nd the analogue of B ¡1 or the inverse of a matrix, we will be able to extend division to matrices as A ¥ B ´ A £ B ¡1 : Returning to ordinary numbers, the inverse of 3 is 13 which satis…es 3 £ 13 = £ 3 = 1: Now in matrix algebra the role of 1 is played by the identity matrix I and so we have: 1 3 De…nition 117 Matrix Inverse: The inverse of a square n £ n matrix A is an n £ n; matrix denoted by A¡1 ; which satis…es: A¡1 A = AA¡1 = I: Remark 1: Generally in matrix algebra we write A £ B ¡1 rather than A ¥ B: Remark 2: With ordinary numbers we often expression division as a ¥ b ´ ab : The notation ab works for ordinary numbers since the order of multiplication does not matter; that is a £ b¡1 = b¡1 £ a ´ ab : For matrices the order of multiplication does matter since A£B ¡1 6= B ¡1 £A: Thus it is a bad notation A to write for matrices: B since it does not indicate whether you mean A £ B ¡1 A ¡1 : or B £ A: Thus for two matrices A and B do not write B Remark 3: A matrix must be square to have an inverse. For example: 2 is not de…ned. 3¡1 3 4 4 ¡2 1 5 6 2 Remark 4: Not all square matrices have an inverse. For example the scalar 0 does not have an inverse nor does any n £ n square matrix 0 have an inverse since 0A = A0 = 0 for all matrices but if A were the inverse of 0 we would have A0 = I; a contradiction. There are also square matrices with non-zero elements which do not have an inverse. We use the following terminology: De…nition 118 Non-Singular Matrices: If a matrix A has an inverse we say that A is non-singular or invertible. De…nition 119 Singular Matrices: If a matrix A does not have an inverse we say A is singular or non-invertible. CHAPTER 3. MATRIX ALGEBRA 114 Example 1: An example of a non-singular matrix is: · ¸ 49 22 22 21 which has an inverse · 49 22 22 21 ¸ £ ¸¡1 1 = 545 · 21 ¡22 ¡22 49 ¸ since: 1 545 · 21 ¡22 ¡22 49 · 49 22 22 21 ¸ = 1 545 · 545 0 0 545 ¸ = · 1 0 0 1 ¸ which you can verify on your own by carrying out the multiplication. Example 2: A matrix with non-zero elements which does not have an inverse (or is singular or non-invertible) is: · ¸ 1 2 A= : 1 2 Proof. We use proof by contradiction. Assume to the contrary that a matrix B = A¡1 exists. Since BA = I by the de…nition of an inverse we have: · ¸· ¸ · ¸ b11 b12 1 2 1 0 = : 1 2 0 1 b21 b22 Carrying out the multiplication we …nd from multiplying the …rst row of B with the …rst column of A that: b11 + b12 = 1 while multiplying the …rst row of B with the second column of A gives: 2b11 + 2b12 = 0 =) b11 + b12 = 0: Combining these two results we obtain the contradiction: 1 = 0. Thus A does not have an inverse. Here are some useful results for inverses: Theorem 120 If A has an inverse then it is unique. ¡ ¢¡1 Theorem 121 If A¡1 exists then A¡1 = A. ¡ ¢¡1 ¡ ¡1 ¢T Theorem 122 If A¡1 exists then AT = A : Theorem 123 If A and B are non-singular matrices of the same order then (AB) ¡1 = B ¡1 A¡1 : CHAPTER 3. MATRIX ALGEBRA 115 Theorem 124 If A is a 2 £ 2 matrix then its inverse exists if and only if a11 a22 ¡ a12 a21 6= 0 and is given by: · a11 ¡1 A = a21 ¸¡1 a12 a22 1 = a11 a22 ¡ a12 a21 Example: The matrix: A= · 1 2 1 2 · a22 ¡a21 ¡a12 a11 ¸ : ¸ does not have an inverse (or is singular) since: a11 a22 ¡ a12 a21 = 1 £ 2 ¡ 2 £ 1 = 0: Remark 1: Note the similarity of (AB)T = B T AT and (AB)¡1 = B ¡1 A¡1 where for both the order is reversed before applying either T or ¡1 to the individual matrices. Remark 2: Later on we will see that for n = 2 the scalar a11 a22 ¡ a12 a21 is the determinant of A or: det [A] = a11 a22 ¡ a12 a21 : Remark 3: Note from Theorem 122 that we can always reverse the order of the transpose T and inverse ¡1 : A consequence of this is that: Theorem 125 If A is symmetric and A¡1 exists, then A¡1 is symmetric. Proof. If A is symmetric then AT = A: Now: ¡ ¡1 ¢T ¡ T ¢¡1 A = A = A¡1 and so A¡1 is symmetric. Example 1: The symmetric matrix: · ¸ 9 3 A= 3 2 has an inverse since: a11 a22 ¡ a12 a21 = 9 £ 2 ¡ 3 £ 3 = 9 6= 0 and A¡1 is given by: · which is also symmetric. 9 3 3 2 ¸¡1 1 = 9 · = · 2 ¡3 ¡3 9 ¸ 2 1 ¡3 9 ¡ 13 1 ¸ CHAPTER 3. MATRIX ALGEBRA 3.4.1 116 Diagonal Matrices Generally speaking multiplying and inverting matrices is di¢cult and best left to computers. There is at least one important special case for which multiplication and inversion is easy. We have: De…nition 126 Diagonal Matrix: If A = [aij ] is an n£n matrix with aij = 0 for i 6= j or 2 3 a11 0 ¢ ¢ ¢ 0 6 0 a22 ¢ ¢ ¢ 0 7 6 7 A=6 . 7 . . .. .. 4 .. 0 5 0 ¢¢¢ 0 ann then A is a diagonal matrix. Example: For the matrices below: 2 3 2 3 2 3 3 0 0 3 0 7 3 0 6 4 0 2 0 5;4 4 2 0 5;4 4 2 0 5 0 0 4 0 0 4 3 0 4 the …rst is a diagonal matrix while the second and third are not. Diagonal matrices are easy to multiply, you just multiply the corresponding diagonal elements. Thus: Theorem 127 If A and B are diagonal matrices of 32 2 b11 0 a11 0 ¢ ¢ ¢ 76 0 6 0 a22 ¢ ¢ ¢ 0 76 6 A£B = 6 . 76 . .. .. 4 .. . 0 5 4 .. . 0 ¢¢¢ 0 ann 0 2 a11 b11 0 ¢¢¢ 0 6 b ¢ ¢ ¢ 0 0 a 22 22 6 = 6 .. .. . . 4 . . 0 . 0 ¢¢¢ 0 ann bnn the same order then: 3 0 ¢¢¢ 0 0 7 b22 ¢ ¢ ¢ 7 7 .. .. . 0 5 . ¢¢¢ 0 bnn 3 7 7 7: 5 Remark: Note that for diagonal matrices AB = BA! Example: Given: 2 3 2 3 2 0 0 5 0 0 A = 4 0 3 0 5; B = 4 0 6 0 5 0 0 4 0 0 7 CHAPTER 3. MATRIX ALGEBRA 117 we have: 2 32 3 2 3 2 3 2 0 0 5 0 0 5£2 0 0 10 0 0 AB = 4 0 3 0 5 4 0 6 0 5 = 4 0 3£6 0 5 = 4 0 18 0 5 : 0 0 4 0 0 7 0 0 4£7 0 0 28 Finding the inverse of diagonal matrices is also very easy, you merely take the inverse of each element along the diagonal. Thus Theorem 128 A diagonal matrix A is non-singular onal elements are non-zero in which case: 2 1 0 ¢¢¢ 0 a11 1 6 0 ¢ ¢ ¢ 0 a22 6 A¡1 = 6 . .. .. 4 .. . 0 . 1 0 ¢¢¢ 0 ann if and only if all its diag3 7 7 7: 5 Example 1: 2 3¡1 2 1 3 0 0 3 4 0 2 0 5 =4 0 0 0 4 0 0 1 2 0 3 0 0 5: 1 4 Example 2: Since the identity matrix I is diagonal with 10 s along the diagonal, it follows that: I ¡1 = I: Example 3: The diagonal matrix: 2 3 3 0 0 4 0 2 0 5 0 0 0 is singular (or non-invertible or it does not have an inverse) since the third diagonal element is zero. 3.5 The Determinant of a Matrix An important characteristic of a square matrix is its determinant. If A is an n £ n matrix then we write its determinant as j A j or det [A] : To begin the determinant when A is a 1 £ 1 scalar or a 2 £ 2 matrix is given by: De…nition 129 If A is a 1 £ 1 scalar then det [A] = A while if A = [aij ] is a 2 £ 2 matrix: · ¸ a11 a12 det [A] = det = a11 a22 ¡ a12 a21: a21 a22 CHAPTER 3. MATRIX ALGEBRA 118 Example: det [5] = 5; det [¡3] = ¡3 while: · ¸ 5 1 det = 5 £ 3 ¡ 1 £ 4 = 11: 4 3 To de…ne determinants properly for n ¸ 3 is somewhat complicated since it involves the concept of a permutation. Rather than going into this we will instead use the Laplace expansion to reduce the calculation of an nth order th th determinant to a series of (n ¡ 1) order determinants. These (n ¡ 1) order determinants are called minors and are found by removing one row and one column from a matrix. De…nition 130 Minors: The i; j th minor of a matrix A; denoted by mij ; is given by: mij = det [Aij ] where Aij is the (n ¡ 1) £ (n ¡ 1) matrix obtained by removing the ith row and the j th column of A: We then de…ne the i; j th cofactor as either mij if i + j is even, or ¡mij if i + j is odd. Thus: De…nition 131 Cofactors: The i; j th cofactor of a matrix A; denoted by cij ; is given by: cij = (¡1)i+j mij where mij is the i; j th minor of A: Example: Consider the 3 £ 3 matrix: 2 3 3 1 4 A = 4 1 2 6 5: 3 1 8 The 1; 1 minor: m11 is obtained by removing the …rst row and …rst column so that · ¸ 2 6 = 10: m11 = det 1 8 Since 1 + 1 = 2 is even, the 1; 1 cofactor is c11 = (¡1)1+1 £ 10 = 10: To calculate the 3; 2 minor: m32 we remove the third row and the second column of A to obtain: · ¸ 3 4 = 14 m32 = det 1 6 CHAPTER 3. MATRIX ALGEBRA 119 and since 3 + 2 = 5 is odd, the cofactor is the negative of m32 : c32 = (¡1)3+2 £ 14 = ¡14: Remark: When calculating the cofactors cij there is a pattern of alternating 10 s and ¡10 s that are applied to the minors mij that looks like this: 2 3 1 ¡1 1 ¢¢¢ 6 ¡1 1 ¡1 ¢ ¢ ¢ 7 6 7 : 6 1 ¡1 1 ¢¢¢ 7 4 5 .. .. .. . . . . . . Notice that the diagonal elements always have matrix this pattern is: 2 1 ¡1 1 ¡1 6 ¡1 1 ¡1 1 6 4 1 ¡1 1 ¡1 ¡1 1 ¡1 1 1: For example with a 4 £ 4 3 7 7: 5 The Laplace expansion then states that det [A] can be found by moving across any row or down any column of A; multiplying each element in that row or column aij by its cofactor cij , and then summing. Theorem 132 Laplace Expansion: Given an n£n matrix A = [aij ] with cofactors cij then det [A] is given either as the sum of the products of the elements of the ith row with their cofactors as: det [A] = ai1 ci1 + ai2 ci2 + ai3 ci3 + ¢ ¢ ¢ + ain cin or as the sum of the products of the elements of the j th column with their cofactors as: det [A] = a1j c1j + a2j c2j + a3j c3j + ¢ ¢ ¢ + anj cnj : Here is a recipe for calculating a determinant: Recipe for Calculating det [A] 1. Pick any row or column of A and move down that row or column. 2. When you get to a particular element aij delete the corresponding row and column, take the determinant of what is left over to obtain the minor mij , and multiply the two as: aij £mij . CHAPTER 3. MATRIX ALGEBRA 120 3. Multiply the result in step 2: by either ¡1 or 1 depending on whether i + j is odd or even. 4. Continue to the next element in the row or column and add all the terms you obtained in step 3: together. Remark: In general calculating determinants is di¢cult and best left to computers, which use more e¢cient algorithms than the Laplace expansion. Unless A has some special properties, you are unlikely to have to calculate by hand determinants larger than 4 £ 4: Example: Consider the 3 £ 3 matrix: 2 3 3 1 4 A = 4 1 2 6 5: 3 1 8 To calculate det [A] let us begin by going across the …rst row. Coming to the …rst element: a11 = 3 we remove the …rst row and …rst column, take the determinant of what is left over and multiply by a11 = 3 to obtain: · ¸ 2 6 1+1 (¡1) £ 3 £ det = 30: 1 8 Since the sum of the rows and columns of a11 is 1 + 1 = 2, which is even, we 1+1 = 1 and so this term does nothing to the result. have (¡1) We now move across the row to the next element a12 = 1: Removing the corresponding row and column and taking the determinant we obtain: · ¸ 1 6 1+2 (¡1) £ 1 £ det = 10: 3 8 1+2 = ¡1 and so changes the Since 1 + 2 = 3 is an odd number, the term (¡1) sign of the result. Finally we come to the last element of the row a13 = 4: Removing the corresponding row and column we obtain: · ¸ 1 2 1+3 (¡1) 4 £ det = ¡20: 3 1 1+3 = 11 and this term does nothing to Since 1 + 3 = 4 is even, the term (¡1) the result. Thus adding all these results together we …nd that: det [A] = a11 c11 + a22 c22 + a33 c33 = 30 + 10 ¡ 20 = 20: Notice the pattern of pluses and minus here is: 1; ¡1; 1. CHAPTER 3. MATRIX ALGEBRA 121 We could also have calculated det [A] above by going across the second row as: det [A] = a21 c21 + a22 c22 + a23 c23 · ¸ · ¸ · ¸ 1 4 3 4 3 1 = ¡1 £ (1) £ det + (2) £ det + ¡1 £ (6) £ det 1 8 3 8 3 1 = 20: (here the pattern of pluses and minus here is: ¡1; 1; ¡ 1) or by going down the third column as: det [A] = a13 c31 + a23 c23 + a33 c33 · ¸ · ¸ · ¸ 1 2 3 1 3 1 = 4 £ det ¡ 6 £ det + 8 £ det 3 1 3 1 1 2 = 20: Notice the pattern of pluses and minus here is: 1; ¡1; 1. Although determinants are hard to numerically calculate, there are a number of results which make theoretical manipulations of determinants quite easy. In particular: Theorem 133 If A and B are square n £ n matrices then 1. det [AB] = det [A] det [B] : £ ¤ 2. det AT = det [A] ¤ £ 3. If A is non-singular then det A¡1 = 1 det[A] : 4. If B is obtained by switching any two rows or any two columns of A then: det [B] = ¡ det [A] : 5. If B is obtained by adding one row or column of A to another row or column of A then det [B] = det [A] : 6. If ® is a scalar and A is an n £ n matrix then det [®A] = ®n det [A] : One of the reasons we are interested in determinants is that they tell us whether or not a matrix A has an inverse; in particular a necessary and su¢cient condition for the inverse to exist is that the determinant not be 0 or: Theorem 134 Given an n £ n matrix A the inverse A¡1 exists if and only if det [A] 6= 0: Theorem 135 Given an n £ n matrix A the inverse A¡1 does not exist if and only if det [A] = 0: CHAPTER 3. MATRIX ALGEBRA 122 Remark: The only scalar that does not have an inverse is 0: While there are square matrices with non-zero elements that do not have an inverse, they nevertheless have a zero-like quality, in particular their determinant must be zero. Later we will see that non-invertible matrices also must have a 0 eigenvalue as well. Remark: From result 1. of the theorem it follows that if either A or B is singular then AB is also singular, that is if either det [A] = 0 or det [B] = 0 then det [AB] = det [A] det [B] = 0: Example: For the matrix: 2 3 3 1 4 A=4 1 2 6 5 3 1 8 we showed that det [A] = 20: It follows then that A¡1 exists. Without actually calculating A¡1 we know that ¤ £ det A¡1 = 1 1 = : det [A] 20 £ ¤ We also know that det AT = 20: Suppose we multiplied every element of A by 2 so that: 2 3 2 3 3 1 4 6 2 8 B = 2A = 2 4 1 2 6 5 = 4 2 4 12 5 : 3 1 8 6 2 16 Then since A is 3 £ 3 it follows that: det [B] = 23 det [A] = 8 £ 20 = 160: 3.5.1 Determinants of Upper and Lower Triangular Matrices Determinants are in general di¢cult to compute. Two types of matrices for which determinants are easy to compute are upper and lower triangular matrices: De…nition 136 Upper Triangular Matrix: An n £ n matrix A = [aij ] is upper triangular matrix if it has all zeros below the diagonal or: 2 3 a11 a12 ¢ ¢ ¢ a1n 6 0 a22 ¢ ¢ ¢ a2n 7 6 7 A=6 . 7 .. .. 4 .. 5 . a . n¡1n 0 ¢¢¢ 0 ann CHAPTER 3. MATRIX ALGEBRA 123 De…nition 137 Lower Triangular Matrix: An n £ n matrix A = [aij ] is lower triangular matrix if it has all zeros above the diagonal or 2 3 a11 0 ¢¢¢ 0 6 a21 a22 ¢¢¢ 0 7 6 7 A=6 . 7: . . .. .. 4 .. 0 5 an1 ¢ ¢ ¢ ann¡1 ann Remark: A diagonal matrix is both upper and lower triangular. Determinants of triangular matrices are easy to calculate. We have: Theorem 138 For either upper or lower triangular matrices the determinant is the product of the diagonal elements. From this it follows that: Theorem 139 A lower or upper triangular matrix is non-singular if and only if all diagonal elements are non-zero. Example 1: Given: 2 3 2 3 2 3 3 1 4 3 0 0 3 0 0 A = 4 0 2 6 5; B = 4 1 2 0 5; C = 4 0 2 0 5 0 0 8 3 1 8 0 0 8 then A is upper triangular, B is lower triangular and C is both upper and lower triangular, that is C is a diagonal matrix. We have: 2 3 3 1 4 det [A] = det 4 0 2 6 5 = 3 £ 2 £ 8 = 48; 0 0 8 2 3 3 0 0 det [B] = det 4 1 2 0 5 = 3 £ 2 £ 8 = 48; 3 1 8 2 3 3 0 0 det [C] = det 4 0 2 0 5 = 3 £ 2 £ 8 = 48: 0 0 8 Example 2: The matrix: 2 3 3 1 4 D=4 0 2 6 5 0 0 0 CHAPTER 3. MATRIX ALGEBRA 124 does not have an inverse since det [D] = 0: Example 3: Since the identity matrix is a diagonal matrix it follows that det [I] = 1: We can use this to prove that if det [A] = 0 then A does not have an inverse. The proof is by contradiction. Suppose then that A¡1 existed and det [A] = 0: Then: £ ¤ ¤ £ 1 = det [I] = det AA¡1 = det [A] det A¡1 = 0 | {z } =0 so that 1 = 0, a contradiction. It follows then that A¡1 does not exist if det [A] = 0: 3.5.2 Calculating the Inverse of a Matrix with Determinants Determinants can be used to calculate the inverse of a matrix using the cofactor and adjoint matrices de…ned as: De…nition 140 Cofactor Matrix: Let A be an n£n square matrix and de…ne the n £ n cofactor matrix C = [cij ] where cij is the i; j th cofactor of A. De…nition 141 Adjoint Matrix: The adjoint matrix of A; written as adj [A] ; is de…ned as the transpose of the cofactor matrix C or : adj [A] = C T : The following result holds for the adjoint matrix: Theorem 142 For any square matrix A : adj [A] £ A = A £ adj [A] = det [A] I: Remark 1: If you carry out the matrix multiplication A£adj [A] for the ith row of A and the ith column of adj [A] and equate this to the i; i element of det [A] I; which is just det [A] ; you will see that this is just the Laplace expansion for det [A]. The result states further that the ith row of A is orthogonal to the j th row of adj [A] : The adjoint matrix adj [A] is nearly the inverse A¡1 since AA¡1 = I while A £ adj [A] = I £ jAj. We thus have: Theorem 143 If det [A] 6= 0 then: A¡1 = 1 adj [A] : det [A] CHAPTER 3. MATRIX ALGEBRA 125 Example 1: For the case of 2 £ 2 matrices: · ¸ a11 a12 A= a21 a22 since det [A] = a11 a22 ¡ a12 a22 ; and the cofactor and adjoint matrices are: · · ¸ ¸ a22 ¡a21 a22 ¡a12 T C= and adj [A] = C = ¡a12 a11 ¡a21 a11 it follows that: A¡1 = Example 2: Consider: 1 a11 a22 ¡ a12 a22 · a22 ¡a21 ¡a12 a11 ¸ : 2 3 3 1 4 A=4 1 2 6 5 3 1 8 which we showed earlier had a determinant of det [A] = 20: The cofactor matrix is given by: 2 3 10 10 ¡5 12 0 5: C = 4 ¡4 ¡2 ¡14 5 For example the 3; 2 element is calculated as the cofactor: · ¸ 3 4 c32 = (¡1)3+2 det = ¡14: 1 6 The adjoint matrix is then found by taking the transpose of C so that: 2 3T 2 3 10 10 ¡5 10 ¡4 ¡2 12 0 5 = 4 10 12 ¡14 5 : adj [A] = 4 ¡4 ¡2 ¡14 5 ¡5 0 5 Note that A £ adj [A] = I £ det [A] is satis…ed since: 2 32 3 2 3 3 1 4 10 ¡4 ¡2 20 0 0 4 1 2 6 5 4 10 12 ¡14 5 = 4 0 20 0 5 3 1 8 ¡5 0 5 0 0 20 2 3 1 0 0 = 20 4 0 1 0 5 : 0 0 1 Thus the inverse of A is: 2 3 2 1 10 ¡4 ¡2 2 1 4 10 12 ¡14 5 = 4 1 A¡1 = 2 20 ¡5 0 5 ¡ 14 ¡ 15 3 5 0 3 1 ¡ 10 7 5 ¡ 10 : 1 4 CHAPTER 3. MATRIX ALGEBRA 3.6 126 The Trace of a Matrix Besides determinants another important characteristic of square matrices, especially in econometrics, is the sum of the diagonal elements or the trace de…ned as: De…nition 144 Trace: If A is a square matrix then the trace of A is denoted by: tr [A] is 3 2 ¢¢¢ a1n a11 a12 6 a21 a22 ¢¢¢ a2n 7 7 6 tr 6 . 7 = a11 + a22 + ¢ ¢ ¢ + ann : . . . . . 4 . . an¡1;n 5 . an1 ¢ ¢ ¢ an;n¡1 ann Example: 2 3 3 1 4 tr 4 1 2 6 5 = 3 + 2 + 8 = 13: 3 1 8 Two important results to remember when manipulating traces are: Theorem 145 tr [A + B] = tr [A] + tr [B] Theorem 146 tr [AB] = tr [BA] Remark: The second property is often very useful in econometrics. We know that for matrices: AB 6= BA: Inside the trace operator however we are free to reverse the order of multiplication. Example 1: Note that: 2 3 2 3 ¸ 3 4 · 3 18 19 5 2 1 4 ¡2 1 5 = 4 ¡13 ¡1 2 5 ¡3 3 4 6 2 24 18 14 has a trace of 3 + ¡1 + 14 = 16 while: 2 3 · ¸ · ¸ 3 4 5 2 1 4 17 24 5 ¡2 1 = ¡3 3 4 9 ¡1 6 2 has a trace of 17 + ¡1 = 16: Thus while the two matrix products AB and BA are di¤erent, their traces, or the sums of their diagonal elements, are the same. Example 2: If X is an n £ p matrix then an important matrix in econometrics is the n £ n matrix: ¢¡1 T ¡ X : P = X XT X CHAPTER 3. MATRIX ALGEBRA 127 ¡ ¢ ¡ ¢¡1 Note that X T X and X T X are p £ p matrices. We have: 3 2 A B z }| {z}|{ 6 ¡ T ¢¡1 T 7 X 7 tr [P ] = tr 6 5 4X X X h ¡ ¢¡1 i = tr X T X X T X = tr [I] = p since I is the p £ p identity matrix. 3.7 Higher Dimensional Spaces 3.7.1 Vectors as Points in an n Dimensional Space: <n This work is dedicated by a humble native of Flatland in the hope that, even as he was initiated into the mysteries Of THREE Dimensions, having been previously conversant with ONLY TWO, so the citizens of that celestial region may aspire yet higher and higher to the secrets of FOUR FIVE OR EVEN SIX dimensions, thereby contributing to the enlargement of THE IMAGINATION and the possible development of that most rare and excellent gift of MODESTY among the superior races of SOLID HUMANITY. -Edwin Abbott-Flatland Edwin Abbott’s book is about the inhabitants of Flatland, a world than unlike our three dimensional world has only two dimensions: forwards and backwards, right and left but no up and down). In the book a native of ‡atland communicates with someone from our three dimensional world who tries to convince him that, besides the two dimensions he experiences there is yet another third dimension: up and down. The di¢culties the ‡atlander experiences grasping this third dimension then mirror our own di¢culties in trying to understand the possibility of say a four dimensional space. As economists we work with higher dimensional spaces all the time. For example in econometrics if you have 100 observations of data, then this is represented as a point in a 100 dimensional space. Fortunately we do not have to visually imagine such a space, instead we simply write down our data as a 100 £ 1 column vector. To see why this makes sense think of a point in one dimension; that is along a line or say along a particular street that runs north/south. Someone asks you where your favourite cafe is and you tell them its 3 blocks north of here. This number 3 then can be thought as a 1 £ 1 column vector: [3] as can any point along the street with negative numbers used to indicate points south. CHAPTER 3. MATRIX ALGEBRA 128 Now consider a two-dimensional space, say the location in a city on any street. Now someone asks you where your favorite cafe is and you say: “Go 3 blocks north of here and 4 blocks east.” This can now be represented by a 2 £ 1 column vector: · ¸ 3 : 4 Now consider three dimensional space. Suppose the cafe is on the 10th ‡oor of a building. You now say “Go 3 blocks north of here and 4 blocks east and go up 10 ‡oors”. This can be represented by a 3 £ 1 column vector: 2 3 3 4 4 5: 10 Let us now try and imagine two four-dimensional beings where one tells his friend how to get to his favourite cafe. Just as we would give directions with three numbers, he would have to give directions with four numbers: one for each dimension. Although we cannot visually imagine it, we could easily write down the 4 £ 1 column vector he would give, for example it might be: 2 3 3 6 4 7 6 7 4 10 5 2 where 2 would represent how far you would have to go in the extra fourth direction. Thus while we cannot visualize spaces of four dimensions or higher, we can easily write down vectors of any dimension and so we are actually able to investigate spaces of any dimension. We thus have: De…nition 147 A point in an n dimensional space is represented by a n£ 1 column vector. This n dimensional space or Euclidean space is denoted by <n : 3.7.2 Length and Distance Once we make this leap to higher dimensional spaces, it is natural ask which of the properties of 3 dimensional space that we are familiar can be extended to n dimensional spaces. The …rst important characteristic is length or distance. No doubt a 4 dimensional person would also want to know how far away his favourite cafe! We have: De…nition 148 The length of an n £ 1 vector x is: q p kxk = xT x = x21 + x22 + ¢ ¢ ¢ + x2n : CHAPTER 3. MATRIX ALGEBRA 129 De…nition 149 The distance between two vectors x and y is kx ¡ yk. Example: If 2 3 2 3 1 5 x = 4 2 5; y = 4 6 5 3 7 then the length of x and y and the distance between x and y are given by: p p 12 + 22 + 32 = 14 = 3:74 kxk = p p 52 + 62 + 72 = 110 = 10:49 kyk = q p kx ¡ yk = (1 ¡ 5)2 + (2 ¡ 6)2 + (3 ¡ 7)2 = 48 = 6:93: Two important results for advanced work are: Theorem 150 kxk = 0 if and only if x = 0; that is x is an n £ 1 vector of zeros. Theorem 151 Triangle Inequality: kx + yk · kxk + kyk : Remark: The triangle inequality basically states that if you walk along a straight line to the point x+y; then you walk a shorter distance than if you walk …rst to x or y and then to x + y; in other words the shortest distance between two points in an n dimensional space is still a straight line! Example: Given x and y above the triangle inequality is satis…ed since: q p (1 + 5)2 + (2 + 6)2 + (3 + 7)2 = 200 = 14:14 kx + yk = p p < kxk + kyk = 14 + 110 = 14:23: 3.7.3 Angle and Orthogonality The second important basic concept for higher dimensional spaces is angle. The angle between two vectors x and y can be sensibly de…ned as follows: De…nition 152 Angle: Given two n £ 1 vectors x and y the angle between x and y is µ de…ned by: cos (µ) = xT y : kxk kyk CHAPTER 3. MATRIX ALGEBRA 130 Example: If 2 then you can verify that: 3 2 3 1 6 x = 4 2 5; y = 4 1 5 3 7 xT y = 29; kxk = p p 14; kyk = 86 so that: cos (µ) = xT y 29 = p p = 0: 835: kxk kyk 14 86 Using the inverse function of cos (µ) : cos¡1 from your calculator we can liberate µ as: µ = cos¡1 (0: 835) = 33:3 so that the angle between x and y is 33:3 degrees. Corresponding to the requirement in trigonometry that jcos (µ)j · 1 we have: Theorem 153 Cauchy-Schwarz Inequality: p p j xT y j· kxk kyk = xT x yT y: Proof. Let ® be a scalar and de…ne f (®) by: f (®) = k®x ¡ yk2 ¸ 0: Now: f (®) = (®x ¡ y)T (®x ¡ y) = ®2 kxk2 ¡ 2®xT y + kyk2 : Now the global minimum of f (®) occurs at ®¤ where f 0 (®¤ ) = 0 since f 00 (®) = 2 kxk2 > 0 so that: 2®¤ kxk2 ¡ 2xT y = 0 =) ®¤ = xT y kxk2 : Thus: f (®¤ ) ¸ =) =) =) =) 0 =) ®¤2 kxk2 ¡ 2®¤ xT y + kyk2 ¸ 0 à à !2 ! xT y xT y 2 kxk ¡ 2 xT y + kyk2 ¸ 0 kxk2 kxk2 ¡ T ¢2 x y 2 kyk ¸ kxk2 ¡ T ¢2 x y ¸ kxk2 kyk2 p p j xT y j· xT x y T y: CHAPTER 3. MATRIX ALGEBRA 131 Remark: The equality j xT y j= kxk kyk occurs only if y = ±x where ± is some scalar. In this case the angle between x and y is 0 so that cos (0) = 1: Example: As an illustration of the Cauchy-Schwarz inequality note that in the example above: p p ¯ T ¯ ¯x y ¯ = 29 < kxk kyk = 14 £ 86 = 34:7: The most important angle that we will be concerned with is where two vectors are at right-angles, or µ = 90o in which case cos (90o ) = 0 and hence xT y = 0: De…nition 154 Orthogonality: If xT y = 0 we say that x and y are orthogonal to each other, or are at right-angles to each other. Sometimes this is denoted as: x?y: Orthogonality and non-orthogonality in <2 are illustrated below: CHAPTER 3. MATRIX ALGEBRA 132 Remark: Since xT y is a scalar if follows that so that: ¡ ¢T ¡ ¢T xT y = xT y = y T xT = yT x xT y = 0 , y T x = 0 and so you can check for orthogonality either by calculating xT y or y T x: Example 1: In previous example x and y are not orthogonal since xT y = 29 6= 0: CHAPTER 3. MATRIX ALGEBRA 133 Example 2: Two orthogonal vectors are: 2 3 2 3 1 6 x = 4 2 5 and y = 4 ¡3 5 3 0 since xT y = 1 £ 6 + 2 £ ¡3 + 3 £ 0 = 0: Suppose x and y are orthogonal so the angle between x and y is 90o : Then x and y form a right-angled triangle with x on one side, y on the other and the sum: x + y on the hypotenuse. You may recall from geometry that for right-angled triangles the Pythagorean relationship a2 + b2 = c2 holds as. The same it turns out holds for x and y in an n dimensional space. In particular: Theorem 155 Pythagorean Relationship: If x and y are orthogonal n £ 1 vectors then: 2 2 2 kx + yk = kxk + kyk : Proof. If x and y are orthogonal then xT y = y T x = 0 and so: =0 2 kx + yk =0 z}|{ z}|{ = (x + y) (x + y) = x x + y y + xT y + y T x = xT x + yT y = kxk2 + kyk2 : T This is illustrated in the diagram below: T T CHAPTER 3. MATRIX ALGEBRA 3.7.4 134 Linearly Independent Vectors Consider a two dimensional space with the two vectors: · ¸ · ¸ 1 0 a1 = ; a2 = : 0 1 These two vectors form a basis for <2 ; that is you can describe any point in <2 as a linear combination of a1 and a2 : For example given the vector x below: · ¸ · ¸ · ¸ 3 1 0 x= =3 +4 = 3a1 + 4a2 : 4 0 1 Most but not all combinations of two vectors will form a basis for <2 : For example two vectors: · ¸ · ¸ 1 3 b1 = ; b2 = : 0 0 so not form a basis for <2 : The key requirement here is that the two vectors be linearly independent of each other or that they point in di¤erent directions. CHAPTER 3. MATRIX ALGEBRA 135 Thus the vector a1 points 1 block north while a2 points 1 block east and so are linearly independent. The two vectors b1 and b2 both point in the same direction: north and so are linearly dependent or b2 = 3b1 : These ideas are extended to higher dimensions as follows: De…nition 156 Linear Independence: Given n vectors a1 ; a2 ; : : : an , we say that they are linearly independent if for any scalars: x1 ; x2 :::xn : a1 x1 + a2 x2 + ¢ ¢ ¢ + an xn = 0 =) x1 = 0; x2 = 0; ¢ ¢ ¢ ; xn = 0: De…nition 157 Given n vectors a1 ; a2 ; : : : an , we say that they are linearly dependent if there exist x1 ; x2 ; :::xn ; one of which is not 0; such that: a1 x1 + a2 x2 + ¢ ¢ ¢ + an xn = 0: This idea can be written more compactly if we think of column vectors a1 ; a2 ; : : : an as the columns of a matrix A: We have: De…nition 158 The n columns of the m £ n matrix A: a1 ; a2 ; : : : an are linearly independent if Ax = 0 =) x = 0 where x is an n £ 1 column vector. De…nition 159 If for x 6= 0: Ax = 0 the columns of A are linearly dependent. Since vectors which are orthogonal to each other must point in di¤erent directions they must be linearly independent and so: Theorem 160 If a1 ; a2 ; : : : an are mutually orthogonal so that aTi aj = 0 for i 6= j then they are linearly independent. Example 1: The vectors: a1 = · 1 0 ¸ ; a2 = · 0 1 ¸ are linearly independent since: · ¸ · ¸ · ¸ 0 0 1 x2 = x1 + 1 0 0 CHAPTER 3. MATRIX ALGEBRA 136 can only be satis…ed when x1 = 0 and x2 = 0: Alternatively putting the two vectors in a 2 £ 2 matrix we have: · ¸· ¸ · ¸ · ¸ · ¸ 1 0 x1 0 x1 0 = =) = 0 1 0 0 x2 x2 and so a1 and a2 are linearly independent. Alternatively since: · ¸ £ ¤ 0 T a1 a2 = 1 0 =0 1 it follows that a1 and a2 are orthogonal and hence are linearly independent. Example 2: The vectors: b1 = · 1 0 ¸ ; b2 = · 3 0 ¸ are linearly dependent since: · ¸ · ¸ · ¸ 1 3 0 x1 + x2 = 0 0 0 can be satis…ed for non-zero x1 and x2 ; for example if x1 = 1 and x2 = ¡ 13 : Alternatively putting b1 and b2 into a matrix we have for x 6= 0 where: · ¸· ¸ · ¸ 1 3 1 0 = 0 0 ¡ 13 0 so that b1 and b2 are linearly dependent. Example 3: Suppose that 2 You may verify that: 3 2 3 3 4 a1 = 4 ¡2 5 and a2 = 4 1 5 : 6 2 2 or: 3 2 3 2 3 3 4 0 a1 x1 + a2 x2 = 4 ¡2 5 x1 + 4 1 5 x2 = 4 0 5 6 2 0 2 3 4 ¡2 6 2 3 3 ¸ · 0 4 x1 1 5 =4 0 5 x2 0 2 can only be satis…ed when x1 = x2 = 0. Consequently these two vectors are linearly independent. CHAPTER 3. MATRIX ALGEBRA 137 Example 4: On the other hand: 2 3 2 3 2 3 ¡6 0 3 4 ¡2 5 x1 + 4 4 5 x2 = 4 0 5 ¡12 0 6 is satis…ed when x1 = 2 and x2 = 1: Consequently these two vectors are linearly dependent. The notion of linear independence leads to the rank of a matrix. De…nition 161 Rank of a Matrix: The rank of a matrix, denoted by rank [A] ; is the maximum number of linearly independent column vectors of A. De…nition 162 Full Rank: If an n £ n matrix A has rank [A] = n we say that A has full rank. We have: Theorem 163 Properties of the Rank of a Matrix: £ ¤ 1. rank [A] = rank AT so the number of linearly independent column vectors equals the number of linearly independent row vectors in A: 2. If A is an m £ n matrix then rank [A] · m and rank [A] · n: 3. If A is a square n £ n matrix then A is non-singular or A¡1 exists if and only if rank [A] = n: 4. If A is a square n £ n matrix then A is non-singular or A¡1 exists if and only if Ax = 0 implies that x = 0 where x is an n £ 1 vector. 5. If A is a square n £ n matrix then A is singular or A¡1 does not exist if and only if there exists an n £ 1 vector x 6= 0 such that Ax = 0 . Example 1: Consider the square matrix: · ¸ 3 2 A= : 1 4 You can verify on your own that A here has a rank of 2; that is the vectors · ¸ · ¸ 3 2 ; 1 4 are linearly independent and consequently rank [A] = 2 and A¡1 exists, as given by: · ¸¡1 · ¸ 2 3 2 ¡ 15 5 = : 3 1 1 4 ¡ 10 10 CHAPTER 3. MATRIX ALGEBRA 138 Example 2: The matrix B given by: · ¸ 3 6 B= 1 2 has a rank of 1 since: · 3 1 ¸ £ ¡2 + · 6 2 ¸ £1 = · 0 0 ¸ or there exists a non-zero x such that Bx = 0 where: · ¸ ¡2 x= 1 since: · 3 6 1 2 ¸· ¡2 1 ¸ = · 0 0 ¸ : Consequently B ¡1 here does not exist or B is singular. It is possible to calculate the rank of a matrix using determinants. Theorem 164 Given any m £ n matrix A suppose h i that r is the order of the ~ largest r £ r sub-matrix: A of A such that det A~ 6= 0: Then rank [A] = r: Example: For the matrix 2 3 1 2 4 5 A=4 2 5 2 1 5 1 2 3 4 examples of 2 £ 2 and 3 £ 3 sub-matrices of A 2 · ¸ 1 2 1 2 ;4 2 5 2 5 1 2 would be: 3 4 2 5: 3 Since A is 3 £ 4 we cannot obtain any larger sub-matrix from A than 3 £ 3: From the theorem we have rank [A] = 3 since for the 3 £ 3 sub-matrix: 2 3 1 2 4 det 4 2 5 2 5 = ¡1 6= 0: 1 2 3 CHAPTER 3. MATRIX ALGEBRA 3.8 139 Solving Systems of Equations Matrix algebra is important for solving systems of linear equations. For example in the demand and supply model: 3 Q=6¡ P 2 1 Q=2+ P 2 demand : supply : plotted below: 4.5 4 3.5 Q3 2.5 2 1.5 1 1.5 2 P 2.5 3 Supply and Demand we wish to …nd the equilibrium price and quantity: Q and P where the demand and supply curves intersect. Now if we set x1 = Q and x2 = P we can rewrite the demand and supply curves as: 3 Q = 6 ¡ P =) x1 = 6 ¡ 2 1 Q = 2 + P =) x1 = 2 + 2 3 x2 =) 2x1 + 3x2 = 12 2 1 x2 =) 2x1 ¡ 1x2 = 4 2 or as the system of equations: 2x1 + 3x2 2x1 ¡ 1x2 = 12 = 4: This can in turn be written in matrix notation as: ¸ · ¸ · ¸· 12 2 3 x1 = 4 x2 2 ¡1 which is in the form Ax = b where: · ¸ · ¸ · ¸ 2 3 x1 12 A= ; x= ; b= : 2 ¡1 4 x2 CHAPTER 3. MATRIX ALGEBRA 140 In the above example we have 2 equations and two unknowns. In general in order to have a unique solution one needs as many equations as unknowns. Thus consider a system of n equations in n unknowns as: a11 x1 + a12 x2 + ¢ ¢ ¢ + a1n xn a21 x1 + a22 x2 + ¢ ¢ ¢ + a2n xn an1 x1 + an2 x2 + ¢ ¢ ¢ + ann xn = b1 = b2 .. . = bn which can be written in matrix notation in the form: Ax = b as: 2 32 3 2 3 b1 a11 a12 ¢ ¢ ¢ a1n x1 6 a21 a22 ¢ ¢ ¢ a2n 7 6 x2 7 6 b2 7 6 76 7 6 7 6 .. .. .. 7 6 .. 7 = 6 .. 7 .. 4 4 . 5 4 5 . . . 5 . . xn bn an1 an2 ¢ ¢ ¢ ann A special property of linear systems of equations is that the number of possible solutions is limited. In particular: Theorem 165 Systems of linear equations have either no solution, one solution, or an in…nite number of solutions. Generally we are interested in the case where one solution exists so that our model predicts that one thing and only one thing happen. We have: Theorem 166 The system of equations: Ax = b has a unique solution if and only if A¡1 exists in which case the unique solution is: x = A¡1 b: Example 1: You can verify that the system of equations in the supply and demand example above: ¸ · ¸ · ¸· 12 2 3 x1 = 4 x2 2 ¡1 has a unique solution: · x1 x2 ¸ = = · · 2 3 2 ¡1 1 8 1 4 3 8 ¡ 14 ¸¡1 · ¸ 12 4 ¸· ¸ · ¸ 12 3 = 4 2 and so the unique solution is: x1 = 3 and x2 = 2 so that Q = 3 and P = 2: CHAPTER 3. MATRIX ALGEBRA 141 Example 2: For the system of equations: 3x1 + 2x2 6x1 + 4x2 = 7 = 14 the second equation is just the …rst multiplied by 2 and so there is really only one equation. This means that there are an in…nite number of solutions of the form: x2 = 7 ¡ 3x1 2 where x1 can be any number. Another way of seeing that there is a problem here is that the matrix A is a singular matrix; that is: · ¸ 3 2 det =0 6 4 and so A¡1 does not exist. Example 3: The system of equations: 3x1 + 2x2 6x1 + 4x2 = 7 = 13 has no solutions since if we divide both sides of the second equation by 2 we obtain: 3x1 + 2x2 3x1 + 2x2 = 7 = 6:5 which implies that 7 = 6:5: Again another way of seeing that there is a problem here is that the matrix A is a singular matrix since: · ¸ 3 2 det =0 6 4 and so A¡1 does not exist. Since we have seen that there are a number of di¤erent necessary and su¢cient conditions for A¡1 to exist, we can state the above result more generally as: Theorem 167 If A is an n £ n matrix the following statements are equivalent in the sense that if one statement holds all the rest hold as well: 1. A¡1 exists. 2. rank [A] = n CHAPTER 3. MATRIX ALGEBRA 142 3. det [A] 6= 0 4. Ax = b has a unique solution (i.e., x = A¡1 b ) 5. Ax = 0 =) x = 0. Since these results are necessary and su¢cient, we can restate this result using the negation of the above …ve statements as: Theorem 168 If A is an n £ n matrix the following statements are equivalent in the sense that if one statement all the rest hold as well: 1. A¡1 does not exist. 2. rank [A] < n 3. det [A] = 0 4. Ax = b either has no solution or an in…nite number of solutions 5. There exists an n £ 1 vector x 6= 0 such that Ax = 0: 3.8.1 Cramer’s Rule Cramer’s rule, a method for solving systems of equations using determinants, is used a lot in economics. Given a system of equations Ax = b suppose we want to calculate the ith component: xi : The key operation is replacing the ith column of A with b. De…nition 169 Given an n £ n matrix A and a n £ 1 column vector b de…ne Ai (b) as the n £ n matrix obtained by replacing the ith column of A with b: Example: Given: A= · 1 2 3 4 ¸ ; b= · 5 6 ¸ then we obtain A1 (b) by putting b in the …rst column to obtain: · ¸ 5 2 A1 (b) = ; 6 4 and we obtain A2 (b) by putting b in the second column to obtain: · ¸ 1 5 A1 (b) = : 3 6 Cramer’s rule then is: CHAPTER 3. MATRIX ALGEBRA 143 Theorem 170 Cramer’s Rule: Given the system of equations: Ax = b with det [A] 6= 0 then: xi = det [Ai (b)] : det [A] Example 1: The system of equations: 3x1 + 2x2 5x1 ¡ 2x2 = 7 = 10 can be rewritten in matrix form as: · ¸· ¸ · ¸ 3 2 x1 7 = . 5 ¡2 10 x2 Using Cramer’s rule we have: · ¸ · 7 2 det det 10 ¡2 17 · ¸ = ; x2 = · x1 = 8 3 2 det det 5 ¡2 ¸ 3 7 5 10 5 ¸= : 16 3 2 5 ¡2 Example 2: Given the system of equations: 2 32 3 2 3 3 1 4 x1 5 4 1 2 6 5 4 x2 5 = 4 6 5 3 1 8 x3 7 to …nd x2 using Cramer’s rule we replace that: 2 3 5 det 4 1 6 3 7 2 x2 = 3 1 det 4 1 2 3 1 3.9 3.9.1 the second column of A with b so 3 4 6 5 8 6 3= : 5 4 6 5 8 Eigenvalues and Eigenvectors Eigenvalues Suppose we have a square n £ n matrix A and we multiply it by an n £ 1 row vector x to obtain: y = Ax: CHAPTER 3. MATRIX ALGEBRA 144 Note that y is itself an n £ 1 row vector. There are very special vectors x, called eigenvectors, which have the property that y = ¸x in which case ¸ is an eigenvalue. These turn out to be of fundamental importance in understanding matrices. De…nition 171 Eigenvalues and Eigenvectors: Let A be a square n £ n matrix and suppose that: Ax = ¸x: where x is an n £ 1 column vector and ¸ is a scalar. Then we say that x is an eigenvector of A and ¸ is an eigenvalue. An n £ n matrix A will in general have n eigenvalues which are the roots of the characteristic polynomial associated with A: De…nition 172 Characteristic Polynomial Given an n £ n matrix A; the characteristic polynomial of A is: f (¸) = det [A ¡ ¸I] = ®0 ¸n + ®1 ¸n¡1 + ®2 ¸n¡2 + ¢ ¢ ¢ + ®n where the coe¢cients ®j depend on the elements of the matrix A: Theorem 173 An n £ n matrix A has n eigenvalues: ¸1 ; ¸2 ; : : : ¸n which are the roots of the characteristic polynomial of A : f (¸i ) = 0 for i = 1; 2; : : : n: Proof. Since x = Ix we can rewrite Ax = ¸x as: Ax = ¸Ix or as: (A ¡ ¸I) x = 0: Since this equation is of the form Bx = 0 where B = (A ¡ ¸I) ; and since we require that x 6= 0; it follows from Theorem 163 that this can only hold if B is singular so that: det [B] = f (¸) = det [A ¡ ¸I] = 0 and so ¸ is a root of f (¸) : Since the characteristic polynomial: f (¸) is an nth degree polynomial, by Theorem 19, the fundamental theorem of algebra, f (¸) has n roots, which are the n eigenvalues of A: Example: The 2 £ 2 matrix A : A= · 5 ¡2 ¡2 8 ¸ CHAPTER 3. MATRIX ALGEBRA 145 has a characteristic polynomial which is a quadratic given by: ·· ¸ · ¸¸ 5 ¡2 1 0 f (¸) = det ¡¸ ¡2 8 0 1 ·· ¸¸ 5 ¡ ¸ ¡2 = det ¡2 8¡¸ = ¸2 ¡ 13¸ + 36: Note that the coe¢cient ¡13 on ¸ is ¡T r [A] and constant term 36 is det [A] ; which is always the case for 2 £ 2 matrices. To …nd the eigenvalues of A we need to …nd the roots of this quadratic or the solutions to: ¸2 ¡ 13¸ + 36 = 0 which are: ¸1;2 = ¡ (¡13) § q (¡13)2 ¡ 4 £ 36 2 or ¸1 = 4 and ¸2 = 9. In the example above note that tr [A] = 13 = ¸1 + ¸2 = 4 + 9 and det [A] = 36 = ¸1 ¸2 = 4 £ 9 so that the trace is equal to the sum of the eigenvalues and the determinant is equal to the product of the eigenvalues. This turns out always to be the case so that: Theorem 174 Given any n £ n matrix A with eigenvalues: ¸1 ; ¸2 ; : : : ¸n : det [A] = ¸1 £ ¸2 £ ¢ ¢ ¢ £ ¸n tr [A] = ¸1 + ¸2 + ¢ ¢ ¢ + ¸n : Since det [A] is the product of the eigenvalues we have: Theorem 175 A¡1 exists if and only if all eigenvalues are not equal to 0: An important fact about eigenvalues is that: Theorem 176 A and AT have the same eigenvalues. £ ¤ Proof. Since det [B] = det B T we have: f (¸) = det [A ¡ ¸I] i h = det (A ¡ ¸I)T £ ¤ = det AT ¡ ¸I and so AT and A have the same characteristic polynomial and hence the same eigenvalues. As you might expect calculating eigenvalues for upper and lower triangular matrices (as well as diagonal matrices) is very easy. We have: CHAPTER 3. MATRIX ALGEBRA 146 Theorem 177 If A = [aij ] is an upper or lower triangular matrix or a diagonal matrix, then the eigenvalues of A are the diagonal elements of A: Proof. Given the assumptions about A the characteristic polynomial of A is: f (¸) = det [A ¡ ¸I] = (a11 ¡ ¸) (a22 ¡ ¸) £ ¢ ¢ ¢ £ (ann ¡ ¸) since A ¡ ¸I is upper or lower triangular and the determinant of such a matrix is the product of the diagonal elements. Therefore if ¸ = aii then f (¸) = 0 and ¸ is an eigenvalue. Example: The 3 £ 3 matrix A below 2 3 4 77 99 A = 4 0 5 55 5 0 0 6 is upper triangular so that its characteristic polynomial is: f (¸) = (4 ¡ ¸) (5 ¡ ¸) (6 ¡ ¸) and so the eigenvalues are the diagonal elements: ¸1 = 4; ¸2 = 5; ¸3 = 6: 3.9.2 Eigenvectors For an n £ n matrix A associated with each of the n eigenvalues ¸1 ; ¸n ; : : : ¸n will be n eigenvectors x1 ; x2 ; : : : xn which satisfy: Axi = ¸i xi : We have: Theorem 178 Eigenvectors associated with distinct eigenvalues are linearly independent. Generally an n £ n matrix will have n distinct eigenvalues so that there will be n linearly independent eigenvectors. This in turn means that: Theorem 179 If all eigenvalues of an n £ n matrix A are distinct then the matrix of eigenvectors C given by: C = [x1 ; x2 ; : : : xn ] has rank [C] = n so that C ¡1 exists. Remark: Complications can arise when there are repeated eigenvalues. For example if the characteristic polynomial of a 3 £ 3 matrix A were: f (¸) = (¸ ¡ 2)2 (¸ ¡ 6) CHAPTER 3. MATRIX ALGEBRA 147 then the eigenvalues would be ¸1 = 2; ¸2 = 2 ¸3 = 6 and so there would be two repeated eigenvalues equal to 2: In this case there might only be 2 linearly independent eigenvectors rather than 3: Another complication with eigenvectors is that unlike eigenvalues they are not uniquely de…ned. In particular if xi is an eigenvector associated with the eigenvalue ¸i , then any scalar multiple of xi will also be an eigenvector; that is if ® is any scalar then: Ax = ¸x =) A (®x) = ¸ (®x) : For example if x is an eigenvector then A (3x) = ¸ (3x) and so 3x is also an eigenvector. To pin down an eigenvector one needs to adopt some convention. This convention changes with the application according to what is convenient. Often for example we adopt the convention that the eigenvectors have a unit length so 1 x which then satis…es ~ = kxk that if x is an arbitrary eigenvector we work with x k~ xk = 1: Example: Consider the matrix: A= · 5 ¡2 ¡2 8 ¸ which we have seen has eigenvalues: ¸1 = 4 and ¸2 = 9: The associated eigenvectors are: · ¸ · ¸ 2 1 x1 = $ ¸1 = 4; x2 = $ ¸2 = 9: 1 ¡2 For example: · 5 ¡2 ¡2 8 ¸· 2 1 ¸ =4 · 2 1 ¸ : You can verify that x1 and x2 are linearly independent since: · ¸ 2 1 =) det [C] = ¡5 6= 0: C = [x1 ; x2 ] = 1 ¡2 Here x1 and x2 are not unique. Instead of x1 we could equally well use the eigenvector 3x1 given by: · ¸ · ¸· ¸ · ¸ 6 5 ¡2 6 6 =) =4 : 3x1 = 3 ¡2 8 3 3 We can normalize x1 and x2 so that kx1 k = 1 and kx2 k = 1 using: q p p p kx1 k = 22 + 12 = 5; kx2 k = 12 + (¡2)2 = 5 and so the normalized eigenvectors would be: · ¸ " p2 # · ¸ " p1 # 1 1 2 1 5 x ~1 = p = p15 ; x = ~2 = p : ¡ p25 5 1 5 ¡2 5 CHAPTER 3. MATRIX ALGEBRA 3.9.3 148 The Relationship A = C¤C ¡1 We have seen that diagonal matrices are much easier to work with. It turns out that almost all matrices can be transformed into diagonal matrices with the eigenvalues along the diagonal. More precisely: Theorem 180 If an n £ n matrix A has n linearly independent eigenvectors then it can be written as: A = C¤C ¡1 where ¤ is a diagonal matrix with the eigenvalues of A along the diagonal as: 2 3 ¸1 0 ¢ ¢ ¢ 0 6 0 ¸2 ¢ ¢ ¢ 0 7 6 7 ¤=6 . . . . . . . ... 7 4 .. 5 0 ¢¢¢ 0 ¸n and ith column of the n £ n matrix C is the ith eigenvector xi as: C = [x1 ; x2 ; : : : ; xn ] : Proof. We then have: AC = [Ax1 ; Ax2 ; : : : ; Axn ] = [¸1 x1 ; ¸2 x2 ; : : : ; ¸n xn ] = C¤: Since the eigenvectors are linearly independent, rank [C] = n and so C ¡1 exists. Post-multiplying both sides by C ¡1 then yields A = C¤C ¡1 : Remark: There are some matrices which cannot be written as: A = C¤C ¡1 but these are in some sense very rare. An example of such a matrix is: · ¸ 1 1 : 0 1 These exceptional matrices have two characteristics: 1) they have repeated eigenvalues and 2) they are not symmetric. Thus in the example above since the matrix is upper triangular we have the repeated eigenvalues: ¸1 = 1 and ¸2 = 1: For such matrices one can use the Jordan representation which we do not discuss here. Given A = C¤C ¡1 suppose we multiply A by itself as A2 = A £ A: Using the representation A = C¤C ¡1 we have: ¡1 ¡1 ¡1 2 ¡1 A2 = C¤C | {z C}¤C = C¤¤C = C¤ C =I CHAPTER 3. MATRIX ALGEBRA 149 where since ¤ is diagonal: 2 6 6 ¤2 = 6 4 ¸21 0 .. . 0 ¸22 .. . ¢¢¢ ¢¢¢ .. . 0 0 .. . 0 ¢¢¢ 0 ¸2n 3 7 7 7: 5 That is we just square the eigenvalues along the diagonal of ¤: This means that the eigenvalues of A2 are just the square of the eigenvalues of A: In general we have: Theorem 181 Given an n £ n matrix A written as: A = C¤C ¡1 then: An = C¤n C ¡1 : Proof. (by induction). The theorem is true for n = 1: Assuming it is true for n ¡ 1 we have: An = An¡1 £ A ¡1 1 ¡1 = C¤n¡1 C | {z C}¤ C =I = C¤n¡1 ¤1 C ¡1 = C¤n C ¡1 : Theorem 182 If A¡1 exists then: A¡1 = C¤¡1 C ¡1 : Proof. Given that A¡1 exists then all eigenvalues are non-zero and so ¤¡1 exists. Therefore: ¡1 ¡1 C¤¡1 C ¡1 A = C¤¡1 C | {z C}¤C ¡1 =I ¡1 = C¤ ¤C = CC ¡1 = I: Example 1: The matrix: A= · 7 3 ¡ 23 ¡ 13 8 3 ¸ has eigenvalues and eigenvectors given by: · ¸ · ¸ 1 1 ; ¸2 = 3 $ x2 = ¸1 = 2 $ x1 = 1 ¡2 CHAPTER 3. MATRIX ALGEBRA so that: ¤= · 2 0 0 3 ¸ 150 ; C= · 1 1 1 ¡2 The representation A = C¤C ¡1 then takes the · 7 ¸ · ¸· 1 1 2 ¡ 13 3 = 8 1 ¡2 0 ¡ 23 3 · ¸· 1 1 2 = 1 ¡2 0 ¸ : form: ¸· ¸¡1 0 1 1 3 1 ¡2 ¸· 2 ¸ 1 0 3 3 1 3 ¡ 13 3 which you can verify by carrying out the multiplication. To calculate A2 directly we have: · 7 ¸ · 7 ¸ · 17 ¡ 13 ¡ 13 2 3 3 3 A = £ = 8 8 ¡ 23 ¡ 23 ¡ 10 3 3 3 while with A2 = C¤2 C ¡1 we have: ¸· · ¸· 2 1 1 2 0 0 32 1 ¡2 2 3 1 3 1 3 1 ¡3 ¸ = · 17 3 ¡ 10 3 To calculate A¡1 from A¡1 = C¤¡1 C ¡1 we have: · 7 · ¸ · ¡1 ¸¡1 ¸· 1 1 2 ¡ 13 0 3 = 8 1 ¡2 0 3¡1 ¡ 23 3 · 4 ¸ 1 9 9 = : 1 7 18 ¡ 53 ¸ ¡ 53 ¸ 22 3 22 3 2 3 1 3 : 1 3 1 ¡3 ¸ 18 Example 2: We can use the representation A = C¤C ¡1 to prove Theorem 174. We have: ¤ £ det [A] = det C¤C ¡1 ¤ £ = det [C] det [¤] det C ¡1 1 = det [C] det [¤] det [C] = det [¤] = ¸1 £ ¸2 £ ¢ ¢ ¢ £ ¸n since the determinant of a diagonal matrix is a product of the diagonal elements. Similarly since tr [AB] = tr [BA] we have: ¤ £ tr [A] = tr C¤C ¡1 ¤ £ = tr ¤C ¡1 C = tr [¤] = ¸1 + ¸2 + ¢ ¢ ¢ + ¸n : Example 3: We can use the representation A = C¤C ¡1 to prove the matrix version of the geometric series: CHAPTER 3. MATRIX ALGEBRA 151 Theorem 183 Given an n £ n matrix A with eigenvalues ¸1 ; ¸n ; : : : ¸n which all satisfy: j¸i j < 1 then: (I ¡ A)¡1 = I + A + A2 + A3 + ¢ ¢ ¢ : For example with A= · 0:3 0:65 0:2 0:72 ¸ : the two eigenvalues of A are ¸1 = 0:92725 and ¸2 = 0:0927: Since these both satisfy j¸i j < 1 we have: µ· ¸ · ¸¶¡1 · ¸ 1 0 0:3 0:65 4:24 9:85 ¡ = 0 1 0:2 0:72 3:03 10:66 · ¸ · ¸ · ¸2 1 0 0:3 0:65 0:3 0:65 +¢¢¢ : = + + 0 1 0:2 0:72 0:2 0:72 3.9.4 Left and Right-Hand Eigenvectors Although A and AT have the same eigenvalues, they do not in general share the same eigenvectors. Let yi be the eigenvector of AT corresponding to eigenvalue ¸i and let xi be the eigenvector of A: Since yi satis…es: AT yi = ¸i yi by taking transposes of both sides we obtain: yiT A = ¸i yiT : For this reason yiT is referred to as a left-hand eigenvector of A while xi is the right-hand eigenvector. We thus have: De…nition 184 The left and right-hand eigenvectors of A corresponding to eigenvalue ¸i are de…ned respectively as: yiT A = ¸i yiT Axi = ¸i xi : We then have the following result: Theorem 185 The left and right-hand eigenvectors of a matrix A corresponding to di¤erent eigenvalues are orthogonal to each other. Proof. Let yj be the left-hand eigenvector corresponding to the eigenvalue ¸j and let xi the right-hand eigenvector corresponding to the eigenvalue ¸i with ¸i 6= ¸j : Then: yjT A = ¸j yjT =) yjT Axi = ¸j yjT xi Axi = ¸i xi =) yjT Axi = ¸i yjT xi CHAPTER 3. MATRIX ALGEBRA 152 so that: yjT Axi = ¸i yjT xi = ¸j yjT xi =) (¸i ¡ ¸j ) yiT xj = 0 =) yiT xj = 0 where the second last line follows from ¸i 6= ¸j : Since yiT xj = 0 it follows that yi and xj are orthogonal. Example: We have seen that above that the matrix: · 7 ¸ ¡ 13 3 A= 8 ¡ 23 3 has right-hand eigenvectors given by: · ¸ · ¸ 1 1 ; ¸2 = 3 $ x2 = : ¸1 = 2 $ x1 = 1 ¡2 The left-hand eigenvectors are the eigenvectors calculated from AT as: · ¸ · ¸ 2 ¡1 y1 = $ ¸1 = 2; y2 = $ ¸2 = 3: 1 1 since for example: AT y1 = · 7 3 ¡ 13 ¡ 23 8 3 ¸· 2 1 ¸ = · 4 2 ¸ =2 · 2 1 ¸ = 2y1 : As predicted by the theorem, the eigenvectors x1 and y2 are orthogonal since: · ¸ £ ¤ ¡1 xT1 y2 = 1 1 = 1 £ ¡1 + 1 £ 1 = 0 1 and the eigenvectors x2 and y1 are orthogonal since: · ¸ £ ¤ 2 xT2 y1 = 1 ¡2 = 1 £ 2 + ¡2 £ 1 = 0: 1 3.9.5 Symmetric and Orthogonal Matrices A nice property for a matrix to have is if its transpose equals its inverse. Such matrices are called orthogonal matrices: De…nition 186 If C ¡1 = C T then C is an orthogonal matrix. Remark 1: If C is orthogonal, then if C is written as a collection of column vectors as: C = [c1 ; c2 ; : : : cn ] then the columns of C are orthogonalpto each other, that is cTi cj = 0 for i 6= j and have a unit length; that is kci k = cTi ci = 1: CHAPTER 3. MATRIX ALGEBRA 153 Remark 2: If C is orthogonal then it preserves length. That is given any n £ 1 vector x and y = Cx we have: q p p p p T kyk = y y = (Cx)T (Cx) = xT C T Cx = xT C ¡1 Cx = xT x = kxk : Remark 3: The only scalars that have the property x = x¡1 are 1 and ¡1: An orthogonal matrix has only eigenvalues equal to 1 and ¡1. This follows since the eigenvalues of C and C T are the same, and C T = C ¡1 . Since the eigenvalues of C ¡1 are the inverse of the eigenvalues of C they must satisfy ¸ = ¸¡1 : It turns out that when a matrix A is symmetric the representation A = C¤C ¡1 always exists. Furthermore the matrix C is orthogonal; that is: C ¡1 = C T so that: Theorem 187 If A is a symmetric matrix then it can be written as: A = C¤C ¡1 = C¤C T where C T = C ¡1 is an orthogonal matrix. Example: The symmetric matrix A given by: · ¸ 5 ¡2 A= ¡2 5 has eigenvalues ¸1 = 3 and ¸2 = 7: The representation A = C¤C ¡1 = C¤C T takes the form: " #· # ¸ " p1 p1 p1 ¡ p12 3 0 2 2 2 A = p1 p1 ¡ p12 p12 0 7 2 2 where: C ¡1 = " p1 2 p1 2 ¡ p12 p1 2 # ; C= " ¸ · p1 2 ¡ p12 p1 2 p1 2 # ; ¤= · 3 0 0 7 ¸ : We then have: 2 A = = · " 5 ¡2 ¡2 5 p1 2 p1 2 ¡ p12 p1 2 ¸ · ¸ 5 ¡2 29 ¡20 £ = ¡2 5 ¡20 29 #· # " ¸ 1 1 p p 32 0 2 2 1 2 p ¡ 2 p12 0 7 CHAPTER 3. MATRIX ALGEBRA 154 and: A ¡1 = = · " 5 ¡2 ¡2 5 p1 2 p1 2 ¸¡1 ¡ p12 p1 2 =) C ¡1 = " 2 p1 2 p1 2 5 21 2 21 = #· 3¡1 0 The matrix C is orthogonal since: " # p1 p1 ¡ 2 2 C = p1 p1 2 · ¡ p12 p1 2 #¡1 2 21 5 21 0 7¡1 = " ¸ ¸" p1 2 ¡ p12 p1 2 ¡ p12 p1 2 p1 2 p1 2 p1 2 # # : = CT : Note that the columns vectors of C are orthogonal to each other and have a length of 1. 3.10 Linear and Quadratic Functions in <n+1 In this section we begin our treatment of multivariate functions. This topic will be treated in more generality in the next chapter. Here we emphasize linear algebra concepts, in particular the multivariate generalization of the linear and quadratic functions 3.10.1 Linear Functions A line in the (x; y) plane, that is in the two dimensional space: <2 ; can be represented by the linear function y = ax + b: Suppose we try and generalize this the case where x is a n £ 1 vector. In this case we have the equivalent of a line in a <n+1 dimensional space with n dimensions for x and 1 dimension for y: De…nition 188 A linear function in an n+1 dimensional space <n+1 takes the form: y = aT x + b where a and x are n £ 1 vectors and b is a scalar. Example: Consider the case where n = 2 and: ¸ · · ¸ 2 x1 a= ; x= ; b = 10 ¡3 x2 CHAPTER 3. MATRIX ALGEBRA 155 then the linear function is: y = aT x + b ¸ · £ ¤ x1 2 ¡3 = + 10 x2 = 2x1 + ¡3x2 + 10: Since n = 2 this describes a plane in a 2 + 1 = 3 dimensional space <3 with two dimensions for the x0 s and 1 for y as shown below: y x2 x1 : y = 2x1 + ¡3x2 + 10 3.10.2 Quadratics Next to the linear function the next most basic function in the (x; y) plane is the quadratic y = ax2 . Consider the problem of generalizing this to where x is an n £ 1 vector. If x is a vector we cannot write ax2 since x2 = x £ x is not de…ned! However if we rewrite this as ax2 = xT ax, and replace a by a n £ n matrix A; then this is de…ned when we let x be a n £ 1 vector. This turns out to be the most useful generalization. De…nition 189 Quadratic Form: If x is an n £ 1 vector and A is a symmetric n £ n matrix then a quadratic form in <n+1 is de…ned as: xT Ax: Remark: There is no loss in generality in assuming that¡ A is symmetric since ¢ if A is not symmetric we could replace A with B = A + AT =2; which is symmetric and: ¡ ¢ ¢ xT A + AT x 1¡ T xT Bx = = x Ax + xT AT x = xT Ax 2 2 ¡ ¢T since xT Ax is a scalar and consequently xT Ax = xT Ax = xT AT x: CHAPTER 3. MATRIX ALGEBRA 156 Example 1: If n = 2 then y = xT Ax = £ x2 x1 ¤ · a12 a22 a11 a12 =) y = x21 a11 + 2x1 x2 a12 + x22 a22 : ¸· x1 x2 ¸ ¸· x1 x2 ¸ Example 2: If n = 2 and A= y = xT Ax = £ · 2 ¡1 ¡1 3 x1 x2 ¤ · ¸ 2 ¡1 ¡1 3 =) y = 2x21 ¡ 2x1 x2 + 3x22 : This quadratic form describes a valley-like quadratic in 3 dimensional space as shown below: y x2 x1 : y= 2x21 ¡ 2x1 x2 + 3x22 Now to generalize from y = ax2 + bx + c we replace ax2 with the quadratic form xT Ax and we replace the linear function bx + c with its multivariate generalization: bT x + c to obtain: De…nition 190 Quadratic: A quadratic in <n+1 takes the form: y = xT Ax + bT x + c where A is a symmetric n £ n matrix, b and x are n £ 1 column vectors and y and c are scalars. Example 1: If n = 2 and · A= 2 ¡1 ¡1 3 ¸ ; b= · 4 5 ¸ ; c = 10 CHAPTER 3. MATRIX ALGEBRA 157 then y = xT Ax + bT x + c =) y = 2x21 ¡ 2x1 x2 + 3x22 + 4x1 + 5x2 + 10: This describes a valley-like quadratic in 3 dimensional space or <3 as shown in the plot below: y x2 x1 : y= 2x21 ¡ 2x1 x2 + 3x22 + 4x1 + 5x2 + 10 Example 2: If we replace A with ¡A in Example 1 then the quadratic becomes: y = ¡2x21 + 2x1 x2 ¡ 3x22 + 4x1 + 5x2 + 10 which describes a mountain-like function in three dimensions or <3 : y x2 x1 : y = ¡2x21 + 2x1 x2 ¡ 3x22 + 4x1 + 5x2 + 10 3.10.3 Positive and Negative De…nite Matrices One of the reasons quadratics are so important is their close relationship with the notions of concavity and convexity. For the ordinary quadratic f (x) = ax2 CHAPTER 3. MATRIX ALGEBRA 158 if a > 0 then f 00 (x) = 2a > 0 and f (x) is globally convex while if a < 0 then f 00 (x) = 2a < 0 and f (x) is globally concave. Now instead of ax2 a multivariate quadratic takes the form f (x) = xT Ax: It turns out that if A > 0; or A is positive de…nite, then f (x) is convex. In Example 1 above it turns out that A > 0 and from the plot it appears that f (x) is indeed valley-like or convex. Similarly if A < 0; or A is negative de…nite, then f (x) is concave or mountain-like. In Example 2 above it turns out that A < 0 and from the plot it appears that f (x) is indeed mountain-like or concave. Now it is not obvious what we mean if we write A > 0 or A ¸ 0 or A < 0 or A · 0 when A is a symmetric matrix. The key to extending these inequalities to matrices is the quadratic form. If we write for a scalar a that the a ¸ 0; this is equivalent to saying that ax2 ¸ 0 for all x (since x2 ¸ 0 ). Similarly if we say a > 0 this is equivalent to saying that ax2 > 0 for all x 6= 0: In general we have: ¢ ¡ a > 0 () ax2 > 0 for all x 6= 0 ¢ ¡ a ¸ 0 () ax2 ¸ 0 for all x ¢ ¡ a < 0 () ax2 < 0 for all x 6= 0 ¢ ¡ a · 0 () ax2 > 0 for all x To generalize to matrices we replace ax2 with the quadratic form: xT Ax: Thus A ¸ 0 if xT Ax ¸ 0 for all x: This leads to the following de…nitions where x is an n £ 1 vector: De…nition 191 We say that A is positive de…nite or A > 0 if and only if xT Ax > 0 for all x except x = 0: De…nition 192 We say that A is positive semi-de…nite or A ¸ 0 if and only if xT Ax ¸ 0 for all x: De…nition 193 We say that A is negative de…nite or A < 0 if and only if xT Ax < 0 for all x except x = 0: De…nition 194 We say that A is negative semi-de…nite or A · 0 if and only if xT Ax · 0 for all x: CHAPTER 3. MATRIX ALGEBRA 159 Remark 1: If x = 0 then xT Ax = 0 no matter what A is. This is the reason why this case is exclude in the de…nitions of positive and negative de…nite matrices. Remark 2: If A is positive (negative) de…nite it follows from the de…nition that the quadratic xT Ax has a unique global minimum (maximum) at x¤ = 0: This is because y = xT Ax is describes an n+1 dimensional valley (mountain) or convex (concave) function with x¤ = 0 the minimum (maximum) of the valley (mountain). As we shall see later, it is a matrix of second derivatives (called the Hessian) being positive or negative de…nite which determines whether any function in <n+1 is concave or convex. Example 1: Consider the case where n = 2 and · ¸ 2 ¡5 A= ¡5 13 in which case: T x Ax = £ x1 x2 ¤ · 2 ¡5 ¡5 13 = 2x21 ¡ 10x1 x2 + 13x22 : ¸· x1 x2 ¸ We can show that A is positive semi-de…nite or A ¸ 0 since for all x1 ; x2 : xT Ax = 2x21 ¡ 10x1 x2 + 13x22 = (x1 ¡ 2x2 )2 + (x1 ¡ 3x2 )2 ¸ 0 since the sum of two squares can never be negative (you can verify that the second line is correct by expanding the two terms and showing it is equal to the previous expression). We therefore conclude that for all x : xT Ax ¸ 0 and so by de…nition A is positive semi-de…nite. We can however prove more, that in fact A is positive de…nite. Suppose xT Ax = 0: This could only occur if (x1 ¡ 2x2 )2 = 0 and (x1 ¡ 3x2 )2 = 0 which in turn implies that x1 = 2x2 and x1 = 3x2 which can only occur if x1 = x2 = 0 since otherwise: x1 = 2x2 and x1 = 3x2 =) 2x2 = 3x2 =) 2 = 3 which is a contradiction. Thus xT Ax = 0 only occurs when x = 0 so that: xT Ax > 0 for all x except x = 0 and so by de…nition the matrix A is positive de…nite. CHAPTER 3. MATRIX ALGEBRA 160 Example 2: Consider the case where n = 2 and · ¸ ¡2 2 A= 2 ¡2 in which case: T x Ax = £ x1 x2 ¤ · ¡2 2 2 ¡2 = ¡2x21 + 4x1 x2 ¡ 2x22 : ¸· x1 x2 ¸ We can show that A is negative semi-de…nite or A · 0 since for all x1 ; x2 : xT Ax = ¡2x21 + 4x1 x2 ¡ 2x22 = ¡2(x1 ¡ x2 )2 · 0: We can cannot prove that A < 0 or that A is negative de…nite since there does exist and x 6= 0 such that xT Ax = 0: In particular if x1 = x2 = 1 then: xT Ax = ¡2(1 ¡ 1)2 = 0 and so A is not negative de…nite. The inequality a > 0 (a < 0) is a strong inequality while a ¸ 0 (a · 0) is a weak inequality. This is because if a > 0 (a < 0) then it follows immediately that a ¸ 0 (a · 0) but one cannot conclude from a ¸ 0 (a · 0) that a > 0 (a < 0) (since if a ¸ 0 it is possible that a = 0 in which case a > 0 would be false). Thus knowing that a > 0 is a stronger result than knowing a ¸ 0 just as knowing that a ¸ 0 is a weaker result than knowing that a > 0: These same relationships hold also for matrices. In particular: Theorem 195 If A > 0 then A ¸ 0 so that a positive de…nite matrix is always positive semi-de…nite; but a positive semi-de…nite matrix is not necessarily positive de…nite. Theorem 196 If A < 0 then A · 0 so that a negative de…nite matrix is always negative semi-de…nite, but a negative semi-de…nite matrix is not necessarily negative de…nite. Example: In Example 2 above the matrix: · ¸ ¡2 2 2 ¡2 is A is negative semi-de…nite and so A · 0 but it is not negative de…nite. CHAPTER 3. MATRIX ALGEBRA 161 De…niteness and the existence of A¡1 Recall that for scalars if a number a satis…es a ¸ 0 but not a > 0 then it must be that a = 0: For matrices if A ¸ 0 but not A > 0; (A is positive semi-de…nite but not positive de…nite), then it does not follow that A = 0; that is that all elements of A are zero. However it does follow that A has certain zero-like properties, in particular that its determinant is 0 and hence it does not have an inverse. This is summarized below: Theorem 197 If A > 0 so that A is positive de…nite then it is non-singular or A¡1 exists. Theorem 198 If A ¸ 0 or A is positive semi-de…nite but A is not positive de…nite then A is singular or A¡1 does not exist. Theorem 199 If A < 0 so that A is negative de…nite then it is non-singular or A¡1 exists. Theorem 200 If A · 0 or A is negative semi-de…nite but A is not negative de…nite then A is singular and A¡1 does not exist. Example: For the two matrices: · ¸ · ¸ 2 ¡5 ¡2 2 ; : ¡5 13 2 ¡2 The …rst we showed is positive de…nite while the second is negative semi-de…nite but not negative de…nite. Note that · ¸ · ¸ 2 ¡5 ¡2 2 det = 1; det =0 ¡5 13 2 ¡2 so that the second matrix is singular and so does not have an inverse while the …rst is non-singular and has an inverse. A positive de…nite matrix must have positive diagonal elements, but the o¤-diagonal elements can be either positive or negative. In general we have: Theorem 201 If A > 0 or A is positive de…nite all diagonal elements must be greater than 0 (or aii > 0 for i = 1; 2; : : : n: ) Theorem 202 If A ¸ 0 or A is positive semi-de…nite all diagonal elements must be greater than or equal to 0 (or aii ¸ 0 for i = 1; 2; : : : n: ) Theorem 203 If A < 0 or A is negative de…nite all diagonal elements must be less than 0 (or aii < 0 for i = 1; 2; : : : n: ) Theorem 204 If A · 0 or A is negative semi-de…nite all diagonal elements must be less than or equal to 0 (or aii · 0 for i = 1; 2; : : : n: ) CHAPTER 3. MATRIX ALGEBRA 162 Remark 1: The signs of the diagonal elements provide necessary conditions but not su¢cient conditions. For example it turns out that the matrix: · ¸ 1 4 4 2 is not positive de…nite even though the diagonal elements are both positive. For the matrix: · ¸ 1 1 A= 1 ¡2 we can immediately conclude that it is not positive de…nite (nor positive semide…nite) since it has a negative diagonal element ¡2: We can also conclude that it is not negative de…nite ( nor negative semi-de…nite) since it also has a positive diagonal element: Note that this last example shows that while for ordinary scalars it is always the case that either a ¸ 0 or a · 0; this is not true for matrices; that is for A above it is not the case that A ¸ 0 and it is not the case that A · 0 . As usual, it is much easier to analyze diagonal matrices for de…niteness. We have: Theorem 205 If A is a diagonal matrix then A > 0 or A is positive de…nite if and only if all diagonal elements are greater than 0 (or aii > 0 for i = 1; 2; : : : n:) Theorem 206 If A is a diagonal matrix then A ¸ 0 or A is positive semide…nite if and only if all diagonal elements are greater than or equal to 0 (or aii ¸ 0 for i = 1; 2; : : : n:) Theorem 207 If A is a diagonal matrix then A < 0 or A is negative de…nite if and only if all diagonal elements are less than 0 (or aii < 0 for i = 1; 2; : : : n:) Theorem 208 If A is a diagonal matrix then A · 0 or A is negative semide…nite if and only if all diagonal elements are less than or equal to 0 (or aii · 0 for i = 1; 2; : : : n: ) Example: The diagonal matrices: · ¸ · ¸ · ¸ · ¸ 1 0 1 0 ¡1 0 ¡1 0 ; ; ; 0 2 0 0 0 ¡2 0 0 are respectively positive de…nite, positive semi-de…nite, negative de…nite and negative semi-de…nite. The diagonal matrix: · ¸ 1 0 0 ¡2 is neither positive de…nite, positive semi-de…nite, negative de…nite nor negative semi-de…nite CHAPTER 3. MATRIX ALGEBRA 163 The Negative of a Positive De…nite Matrix With scalars if you multiply a positive number by ¡1 you get a negative number. The same result holds for matrices, if you multiply a positive de…nite matrix by ¡1 you get a negative de…nite matrix. In general Theorem 209 A is positive de…nite (or A > 0 ) if and only if ¡A is negative de…nite (¡A < 0). Theorem 210 A is positive semi-de…nite (or A ¸ 0) if and only if ¡A is negative semi-de…nite (or ¡A · 0 ). This means that if you have say a positive de…nite matrix, then you can …nd another negative de…nite matrix by multiplying all elements by ¡1: Example: Since we have already seen that: · ¸ 2 ¡5 ¡5 13 is positive de…nite it follows that: · ¡2 5 5 ¡13 ¸ is negative de…nite. The Form A = B T B Recall that if a scalar is of the form a = b2 then we immediately obtain the weak inequality: a ¸ 0 since a square is always non-negative. Given the additional information b 6= 0 we can obtain the strong inequality: a > 0: Now suppose that the matrix A takes a similar form: A = B T B: We then obtain the weak inequality: A ¸ 0 or that A is positive semi-de…nite. Furthermore by restricting B we obtain a strong inequality that A > 0 or that A is positive de…nite. This result turns out to be quite important in econometrics. Theorem 211 If B is an m £ n matrix and A = B T B then A ¸ 0 or A is positive semi-de…nite. Theorem 212 If B is an m £ n matrix with rank [B] = n then A > 0 or A = B T B is positive de…nite. Proof. If A = B T B then de…ne y as y = Bx: Now for all x: xT Ax = = = = xT B T Bx (Bx)T Bx yT y kyk2 ¸ 0 CHAPTER 3. MATRIX ALGEBRA 164 Now if rank [B] = n then y = Bx = 0 =) x = 0: Therefore for x 6= 0 it follows 2 that y 6= 0 so that xT Ax = kyk > 0 and hence A is positive de…nite. Example: Given: 2 3 3 4 B = 4 ¡2 1 5 6 2 you can verify that rank [B] = 2; that is the two columns of B are linearly independent. Thus Theorem 212 the matrix A = B T B given by: 2 3 · ¸ · ¸ 3 4 3 ¡2 6 4 49 22 ¡2 1 5 = BT B = 4 1 2 22 21 6 2 is positive de…nite. 3.10.4 Using Determinants to Check for De…niteness We can use determinants to check whether a matrix is positive or negative de…nite. The key concept is the leading principal minor de…ned as: De…nition 213 Leading Principal Minors: The ith leading principal minor of the n £ n matrix A is given by: Mi = det [Aii ] where Aii is the i £ i matrix obtained from the …rst i rows and columns of A: Example: If A is given by: 2 3 3 1 2 A=4 1 6 3 5 2 3 8 we have 3 leading principal minors: M1 M2 M3 We have: = det [3] = 3 · 3 1 = det 1 6 2 3 1 = det 4 1 6 2 3 ¸ = 17 3 2 3 5 = 97 8 Theorem 214 The matrix A is positive de…nite if and only if all the leading principal minors are strictly positive; that is: M1 > 0; M2 > 0; : : : Mn > 0: CHAPTER 3. MATRIX ALGEBRA 165 Theorem 215 The matrix A is negative de…nite if and only if all the principal minors alternative in sign with the …rst being negative or: M1 < 0; M2 > 0; M3 < 0; : : : : Example 1: For a general 2 £ 2 matrix · a11 A= a12 a12 a22 ¸ to be positive de…nite we require that M1 = a11 > 0 and M22 = det [A] = a11 a22 ¡ a212 > 0: The last result implies that: p ja12 j < a11 a22 : Thus in addition to the diagonal elements being positive, we require that the o¤-diagonal elements not be too large in absolute value relative to the diagonal elements. Thus the matrices: · ¸ · ¸ 1 4 1 ¡4 ; 4 2 ¡4 2 are both not positive de…nite since the o¤-diagonal element, here either 4 or p ¡4; is too large relative to the diagonal elements or 4 > 1 £ 2. Thus so that for both matrices: M1 = 1 > 0 but M2 = ¡14 < 0 and so neither matrix is positive de…nite. Example 2: The matrix we considered 2 3 A=4 1 2 above: 3 1 2 6 3 5 3 8 is positive de…nite since: M1 = 3 > 0; M2 = 17 > 0 and M3 = 97 > 0: Example 3: If 2 3 ¡3 ¡1 ¡2 A = 4 ¡1 ¡6 ¡3 5 ¡2 ¡3 ¡8 then: M1 M2 M3 = det [¡3] = ¡3 < 0 · ¸ ¡3 ¡1 = det = 17 > 0 ¡1 ¡6 2 3 ¡3 ¡1 ¡2 = det 4 ¡1 ¡6 ¡3 5 = ¡97 < 0: ¡2 ¡3 ¡8 CHAPTER 3. MATRIX ALGEBRA 166 Since the leading principal minors alternate in sign with M1 < 0 it follows that A is negative de…nite. Example 4: Given: 2 3 3 4 B = 4 ¡2 1 5 6 2 you can verify that rank [B] = 2; that is independent. Thus Theorem 212 predicts verify this note that: 2 · ¸ 3 3 ¡2 6 4 ¡2 BT B = 4 1 2 6 the two columns of B are linearly that B T B is positive de…nite. To 3 · ¸ 4 49 22 5 1 = 22 21 2 which is positive de…nite since from the leading principal minors: M1 = 49 > 0 and M2 = 545 > 0: Remark 1: At …rst it is tricky to remember the condition for A to be negative de…nite since intuitively one would think that all leading principal minors must be negative. It may help you to remember the rule if you consider a diagonal matrix with negative elements, which we know is negative de…nite. For example: 2 3 ¡1 0 0 4 0 ¡2 0 5: 0 0 ¡3 Here M1 = ¡1 < 0 but M2 = ¡1 £ ¡2 = 2 > 0 because the product of two negative numbers is positive. Finally M3 = ¡1 £ ¡2 £ ¡3 = ¡6 < 0: This is why Mi must alternate in sign for negative de…nite matrices. Remark 2: We cannot easily extend these criteria to positive semi-de…nite and negative semi-de…nite matrices. For example it does not follow that if M1 ¸ 0; M2 ¸ 0; : : : that A is positive semi-de…nite. For example the matrix: 2 3 1 0 0 4 0 0 0 5 0 0 ¡1 has M1 = 1 ¸ 0; M2 = 0 ¸ 0 and M3 = 0 ¸ 0 but this matrix is not positive semi-de…nite since it has a negative diagonal element. 3.10.5 Using Eigenvalues to Check for De…niteness An alternative method for testing for de…niteness is to use eigenvalues. We have: CHAPTER 3. MATRIX ALGEBRA 167 Theorem 216 If A is a symmetric n £ n matrix with eigenvalues: ¸i ; i = 1; 2; : : : n then: 1. A is positive de…nite ( A > 0 ) if and only if: ¸i > 0 for i = 1; 2; : : : n: 2. A is positive semi-de…nite ( A ¸ 0 ) if and only if: ¸i ¸ 0 for i = 1; 2; : : : n: 3. A is negative de…nite ( A < 0 ) if and only if: ¸i < 0 for i = 1; 2; : : : n: 4. A is negative semi-de…nite ( A · 0 ) if and only if: ¸i · 0 for i = 1; 2; : : : n: Proof. We prove only 1: and 2: but 3: and 4: follow the same reasoning. Given that A is symmetric we have A = C¤C T where ¤ is a diagonal matrix with the eigenvalues of A along its diagonal and C is an orthogonal matrix so that C T = C ¡1 . We then have: xT Ax = xT C¤C T x = y T ¤y = y12 ¸1 + y22 ¸2 + ¢ ¢ ¢ + yn2 ¸n where y = C T x is an n £ 1 vector. Since ¤ is a diagonal matrix it follows that xT Ax = yT ¤y ¸ 0 if and only ¸i ¸ 0 for all i and so 2: follows. Since C T is non-singular, it follows that x = 0 if and only if y = 0 so that xT Ax = y T ¤y > 0 for x 6= 0 if and only if ¸i > 0 for all i and so 1: follows. Example 1: If A is given by: A= · 2 ¡5 ¡5 13 ¸ we …nd that the eigenvalues are ¸1 = ¸2 = 15 1 p + 221 = 14:933 > 0 2 2 15 1 p ¡ 221 = 0:0066966 > 0: 2 2 It follows that A is positive de…nite. Example 2: If A is given by: A= · ¡2 2 2 ¡2 ¸ we …nd that the eigenvalues are ¸1 ¸2 = ¡4 < 0 = 0: CHAPTER 3. MATRIX ALGEBRA 168 Since all the eigenvalues satisfy ¸i · 0 it follows that A is negative semi-de…nite. However since ¸2 = 0 it follows that A is not negative de…nite. Example 3: If A is given by: A= · 1 4 4 2 ¸ we …nd that the eigenvalues are 3 1p + 65 = 5:531 > 0 2 2 3 1p ¸2 = ¡ 65 = ¡2:531 < 0: 2 2 Since the eigenvalues have the opposite sign, the matrix A is neither positive de…nite, positive semi-de…nite, negative de…nite nor negative semi-de…nite. ¸1 3.10.6 = Maximizing and Minimizing Quadratics Consider the problem of maximizing (minimizing) the quadratic: y = xT Ax + bT x + c: where x is an n £1 vector, A is a symmetric n £ n matrix, and b is a n £ 1 vector and c is a scalar. We also assume that A is negative de…nite which implies that A¡1 exists. In the next chapter we will see how to solve this problem using multivariate calculus. It is possible to …nd x¤ without calculus using a technique called completing the square. We have: Theorem 217 The value of x which maximizes (minimizes) y = xT Ax + bT x + c: where A is negative (positive) de…nite is: 1 x¤ = ¡ A¡1 b: 2 Proof. Completing the square amounts to showing that: xT Ax + bT x + c = (x ¡ x¤ )T A (x ¡ x¤ ) + c ¡ bT A¡1 b : 4 You can verify this as follows: (x ¡ x¤ )T A (x ¡ x¤ ) = xT Ax ¡ x¤T Ax ¡ xT Ax¤ + x¤T Ax¤ 1 1 1 = xT Ax + bT A¡1 Ax + xT AA¡1 b + bT A¡1 AA¡1 b 2 2 4 1 T 1 T 1 T ¡1 T = x Ax + b x + x b + b A b 2 2 4 1 T ¡1 T T = x Ax + b x + b A b 4 CHAPTER 3. MATRIX ALGEBRA 169 since bT x = xT b: Thus: 1 xT Ax + bT x = (x + x¤ )T A (x + x¤ ) ¡ bT A¡1 b 4 from which it follows that: y = (x ¡ x¤ )T A (x ¡ x¤ ) + c ¡ bT A¡1 b : 4 If A is negative de…nite then it follows that: (x ¡ x¤ )T A (x ¡ x¤ ) < 0 for (x ¡ x¤ ) 6= 0 (x ¡ x¤ )T A (x ¡ x¤ ) = 0 for (x ¡ x¤ ) = 0 and hence: y ·c¡ bT A¡1 b 4 with equality only when x = x¤ : If A is positive de…nite then replace < with > above and the result follows. It follows then that x¤ is a global maximum (minimum). Example 1: If n = 2 and · A= 2 ¡1 ¡1 3 ¸ ; b= · 4 5 ¸ ; c = 10 then y = xT Ax + bT x + c =) y = 2x21 ¡ 2x1 x2 + 3x22 + 4x1 + 5x2 + 10: You can check that the matrix A is positive de…nite since M1 = 2 > 0 and M2 = 5 > 0 so we will look for a minimum of the quadratic plotted below: y x2 x1 : y = 2x21 ¡ 2x1 x2 + 3x22 + 4x1 + 5x2 + 10 CHAPTER 3. MATRIX ALGEBRA 170 We have: 1 1 x = ¡ A¡1 b = ¡ 2 2 ¤ · 2 ¡1 ¡1 3 ¸¡1 · 4 5 ¸ = · ¡ 17 10 ¡ 75 ¸ 7 ¤ so that the global minimum occurs at x¤1 = ¡ 17 10 and x2 = ¡ 5 : Example 2: The linear regression model is: Y = X¯ + e where Y is an n £ 1 vector, X is an n £ p matrix of rank p, and e is an n £ 1 ^ is the value of ¯ which vector of random errors. The least squares estimator ¯ minimizes the sum of squares function: S (¯) = (Y ¡ X¯)T (Y ¡ X¯) = ¯ T X T X¯ ¡ 2Y T X¯ + Y T Y: Although the notation may cover this up, in fact S (¯) is a quadratic. Here ^ A = X T X; b = ¡2X T Y and c = Y T Y . If rank [X] = p then A x is ¯; x¤ is ¯; is positive de…nite by Theorem 212 and so making the translation in notation from 1 x¤ = ¡ A¡1 b 2 we …nd that: ^ ¯ 1 = ¡ A¡1 b 2 =A z }| {z =b }| ¢{ 1 ¡ T ¢¡1 ¡ ¡2X T Y = ¡ X X 2 ¡ T ¢¡1 T X Y: = X X ¢ ¡ ^ = X T X ¡1 X T Y is one of the central results in econometrics. The formula ¯ Example 3: Suppose one has data on 10 families where Yi is the consumption of family i; Xi1 is the income of family i and Xi2 is the wealth of family i and suppose that: Yi = Xi1 ¯ 1 + Xi2 ¯ 2 + ei for i = 1; 2; : : : 10: The parameter ¯ 1 is the marginal propensity to consume out of income while ¯ 2 is the marginal propensity to consume out of wealth while ei is a random error. CHAPTER 3. MATRIX ALGEBRA Suppose that the actual data takes the form: 2 3 2 10 22:1 6 25 6 88:2 7 6 7 6 6 7 6 7:8 7 6 7 6 6 41 6 29:2 7 6 7 6 6 10 6 8:8 7 6 7 6 Y =6 7 ; X = 6 75 217:4 6 7 6 6 21 6 35:1 7 6 7 6 6 77 6 61:9 7 6 7 6 4 21 4 11:4 5 71 52:7 171 100 355 10 50 3 860 62 107 10 45 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 so that for example family 1’s consumption is 22:1; their income is 10 and their wealth is 100: From this data we wish to estimate ¯ 1 and ¯ 2 using the least ¡ ¢ ^ = X T X ¡1 X T Y . squares estimator: ¯ We have: 3 2 10 100 6 25 355 7 7 6 6 7 10 7 7 6 7 6 · ¸ 6 41 50 7 6 10 25 7 41 10 75 21 77 21 71 6 10 3 7 7 XT X = 7 100 355 10 50 3 860 62 107 10 45 6 6 75 860 7 6 21 62 7 7 6 6 77 107 7 7 6 4 21 10 5 71 45 · ¸ 20032 89471 = 89471 895652 and: 2 XT Y 6 6 6 6 6 · ¸6 10 25 7 41 10 75 21 77 21 71 6 6 = 100 355 10 50 3 860 62 107 10 45 6 6 6 6 6 6 4 = · 29555:3 233334:4 ¸ : 22:1 88:2 7:8 29:2 8:8 217:4 35:1 61:9 11:4 52:7 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 CHAPTER 3. MATRIX ALGEBRA 172 ^ is given by: Thus the least squares estimator ¯ ¢ ¡ ^ = X T X ¡1 X T Y ¯ · ¸¡1 · ¸ 20032 89471 29555:3 = 89471 895652 233334:4 · ¸ 0:56 = : 0:20 ^ 1 = 0:56 The estimated marginal propensity to consume out of income is then: ¯ ^ while the estimated marginal propensity to consume out of wealth is: ¯ 2 = 0:20: 3.11 Idempotent Matrices There are only two scalars that have the property a2 = a : 02 = 0 and 12 = 1: An idempotent matrix is like 0 or 1 in that it also has this property. Thus: De…nition 218 Idempotent Matrix: An n £ n matrix P is said to be idempotent if P P = P: ^ = Example 1: Recall from the linear regression model Y = X¯ + e that ¯ ¡ T ¢¡1 T X X X Y: The vector of …tted values, or the values of Y predicted by the estimated model are given by: Y^ ^ = X¯ ¢¡1 T ¡ X Y = X XT X = PY where P is given by: P is idempotent since: PP ¢¡1 T ¡ X : P = X XT X ¡ ¢¡1 T ¡ T ¢¡1 T = X XT X X X | {zX} X X ¡ T ¢¡1 =I = XI X X XT ¡ ¢¡1 T = X XT X X = P: The least squares residual, or the part of Y that the model cannot explain, is given by: e^ ´ Y ¡ Y^ = Y ¡ PY = (I ¡ P ) Y: CHAPTER 3. MATRIX ALGEBRA 173 The matrix I ¡ P is also idempotent since: (I ¡ P ) (I ¡ P ) = I ¡ IP ¡ P I + P P = I ¡P ¡P +P = I ¡ P: Since P P = P it can further be shown that P (I ¡ P ) = P ¡ P P = P ¡ P = 0: Example 2: If 2 3 1 X =4 1 5 1 then in econometrics this would correspond to the regression: Yi = ¹ + ei with 3 observations and a constant term. To calculate P for this X note that: 2 3 1 £ ¤ X T X = 1 1 1 4 1 5 = 3: 1 Thus: P ¢¡1 T ¡ X = X XT X 2 3 1 £ ¤ = 4 1 5 3¡1 1 1 1 1 2 3 1 1 1 14 1 1 1 5 = 3 1 1 1 2 1 1 1 3 = 4 3 1 3 1 3 3 1 3 1 3 3 1 3 1 3 5: We know P is idempotent but you might want to check this by multiplying P by itself. The idempotent matrix I ¡ P is then: 3 2 3 2 1 1 1 3 2 2 ¡ 13 ¡ 13 1 0 0 3 3 3 3 2 4 0 1 0 5 ¡ 4 1 1 1 5 = 4 ¡1 ¡ 13 5 : 3 3 3 3 3 1 1 1 2 1 1 ¡3 ¡3 0 0 1 3 3 3 3 CHAPTER 3. MATRIX ALGEBRA 3.11.1 174 Important Properties of Idempotent Matrices Idempotent matrices have the following properties: Theorem 219 If P is idempotent then: 1. The eigenvalues of P are all either 0 or 1. 2. If P is symmetric then it is positive semi-de…nite. 3. tr [P ] = rank [P ]. 4. If x and y are two n £ 1 vectors and if P is symmetric then w = P x and z = (I ¡ P ) y are orthogonal; that is wT z = 0: Proof. If P is idempotent then an eigenvector x 6= 0 and eigenvalue ¸ satisfy: P x = ¸x: Multiplying both sides by P we …nd that: PPx = ¸P x =) ¸x = ¸2 x =) ¸ (1 ¡ ¸) x = 0 since: P x = ¸x and P P x = P x = ¸x. Now since x 6= 0 it follows that ¸ (1 ¡ ¸) = 0 so that: ¸ = 0 or ¸ = 1: Since ¸ = 0 or ¸ = 1 it follows that ¸ ¸ 0 and so P is positive semi-de…nite. Since tr [P ] is the sum of the eigenvalues which equals the number of eigenvalues equal to 1 which is the rank of P: If P is symmetric or P = P T then and w = P x and z = (I ¡ P ) y then: wT z = (P x)T (I ¡ P ) y = xT P T (I ¡ P ) y = xT P (I ¡ P ) y = xT (P ¡ P P ) y = xT (P ¡ P ) y = xT 0y = 0 and so w and z are orthogonal. Example 1: Consider the idempotent matrices: P and I ¡ P from the previous example where: 2 1 1 1 3 2 2 3 ¡ 13 ¡ 13 3 3 3 3 2 ¡ 13 5 : P = 4 31 31 13 5 ; I ¡ P = 4 ¡ 13 3 1 1 1 2 1 1 ¡3 ¡3 3 3 3 3 The eigenvalues of P are determined from the characteristic polynomial: 2 1 3 1 1 3 ¡¸ 3 3 1 1 5 = ¸2 ¡ ¸3 f (¸) = det [P ¡ ¸I] = det 4 13 3 ¡¸ 3 1 1 1 3 3 3 ¡¸ = ¸2 (¸ ¡ 1) = 0 CHAPTER 3. MATRIX ALGEBRA 175 and so the eigenvalues are: ¸1 = 1; ¸2 = 0 and ¸3 = 0: Since the eigenvalues satisfy ¸ ¸ 0 it follows that P is positive semi-de…nite. Since two of the eigenvalues are 0 however P is not positive de…nite and hence P ¡1 does not exist. The trace of P is given by: 1 1 1 + + =1 3 3 3 which is the rank of P ( i.e., P only has one linearly independent column). The trace of I ¡ P is: 2 2 2 tr [I ¡ P ] = + + = 2 3 3 3 which is also equal to rank [I ¡ P ] : Let us take any 3 £ 1 vector x; say: 2 3 1 x=4 2 5 3 tr [P ] = and multiply x by P and I ¡ P to obtain: 2 1 1 1 32 1 3 3 3 w = P x = 4 13 31 31 5 4 2 1 1 1 3 3 23 32 1 ¡3 3 2 z = (I ¡ P ) x = 4 ¡ 13 3 1 ¡ 3 ¡ 13 3 2 3 2 5=4 2 5 2 32 3 2 3 1 1 ¡1 ¡3 ¡ 13 5 4 2 5 = 4 0 5 : 2 3 1 3 Note that w and z are orthogonal since: 2 3 ¡1 £ ¤ wT z = 2 2 2 4 0 5 = 2 £ ¡1 + 2 £ 0 + 2 £ 1 = 0: 1 Example 2: The vector of …tted values Y^ and the vector of least squares residuals are given by: Y^ = P Y; e^ = (I ¡ P ) Y ¢¡1 T ¡ X : Earlier we showed that tr [P ] = p and so the rank where P = X X T X of the n £ n matrix P is p: It follows that the two vectors are orthogonal from the above theorem or: Y^ T e^ = Y T = YT = YT (I ¡ P )T P Y (I ¡ P ) P Y ¡ ¢ P ¡ P2 Y = Y T 0Y = 0: CHAPTER 3. MATRIX ALGEBRA 176 Since Y = Y^ + e^ where Y^ and e^ are orthogonal there exists from Theorem 155 a Pythagorean relationship exists: Y T Y = Y^ T Y^ + e^T e^: Dividing both sides by Y T Y we obtain: 1= Y^ T Y^ e^T e^ + : Y TY Y TY The …rst term on the right is the uncentered R2 de…ne by: ° °2 °^° T ^ °Y ° ^ Y Y 2 R = T = Y Y kY k2 which measures the percentage variation in Y explained by the regression model. Alternatively from De…nition 152 the angle between Y and Y^ is: Y^ T Y ° ° ° ° kY k °Y^ ° ³ ´ Y^ T Y^ + e^ ° ° = ° ° kY k °Y^ ° ° ° ° °2 °^° °^° °Y ° °Y ° ° °= = ° ° kY k kY k °Y^ ° p R2 : = cos (µ) = Basically the closer R2 is to 1 the smaller the angle between Y and Y^ the more the model explains Y: Now since: ° °2 °^° °Y ° k^ ek2 2 = 1 ¡ : R = kY k2 kY k2 it follows that: 0 · R2 · 1: ^ = 0 (the model You might want to try and show that R2 = 0 if and only if ¯ 2 explains nothing) and R = 1 if and only if e^ = 0 (the model is a perfect …t). 3.11.2 The Spectral Representation Closely related to the representation A = C¤C ¡1 is the spectral representation. We have: CHAPTER 3. MATRIX ALGEBRA 177 Theorem 220 The Spectral Representation: Given an n £ n matrix A written as A = C¤C ¡1 then A = ¸1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn where the n £ n matrices Pi given by: Pi = xi yiT xTi yi and where xi and yi are the right and left-hand eigenvectors corresponding to the eigenvalue ¸i of A: The matrices Pi are idempotent and orthogonal to each other in that Pi £ Pi = Pi and Pi Pj = 0 for i 6= j: Remark: That the matrices Pi are idempotent follows from the fact that: Pi £ Pi = x yT x yT ¡ iT i ¢ £ ¡ iT i ¢ xi yi xi yi a scalar = z }| { xi yiT xi yiT ¡ T ¢2 xi yi =xT i yi = = z }| { yiT xi xTi yiT ¡ T ¢2 xi yi x yT ¡ iT i ¢ = Pi : xi yi That they are orthogonal follows from Theorem 185 that left and right-hand eigenvectors from di¤erent eigenvalues are orthogonal or: xTi yj = 0 for i 6= j: Thus: Pi £ Pj = x yT x yT ¡ iT i ¢ £ ¡ iT i ¢ xi yi xi yi =0 z }| { xi yiT xi yiT = ¡ ¢2 xTi yi = 0: An implication of the spectral representation then is that: Theorem 221 Given the spectral representation for the n £ n matrix A : A = ¸1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn CHAPTER 3. MATRIX ALGEBRA 178 then the mth power of A is given by: m m Am = ¸m 1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn : Proof. We will prove this by induction. Note that it is obviously true for m = 1: Now suppose it is true for m ¡ 1: We then have: Am = Am¡1 £ A ¢ ¡ = ¸m¡1 P1 + ¸m¡1 P2 + ¢ ¢ ¢ + ¸m¡1 Pn £ (¸1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn ) : 1 2 n Since Pi Pj = 0, any cross-product terms drop out and hence: ¢ ¢ ¢ ¡ ¡ ¡ Am = ¸m¡1 £ ¸1 P1 £ P1 + ¸m¡1 £ ¸2 P2 £ P2 + ¢ ¢ ¢ + ¸m¡1 £ ¸n Pn £ Pn 1 2 n m m = ¸m 1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn £ ¸i = ¸m since Pi £ Pi = Pi and ¸m¡1 i : i When a matrix is symmetric its left and right-hand eigenvectors are identical and so the spectral representation take the form: Theorem 222 If the n £ n matrix A is symmetric then A = ¸1 P1 + ¸2 P2 + ¢ ¢ ¢ + ¸n Pn where the n £ n matrices Pi given by: xi xT Pi = ¡ T i ¢ xi xi and where xi is the eigenvector corresponding to the eigenvalue ¸i of A:: The matrices Pi are idempotent and orthogonal to each other in that Pi £ Pi = Pi and Pi Pj = 0 for i 6= j: Example: For A below the spectral representation is: A= · 5 ¡2 ¡2 5 ¸ =¸1 z · z}|{ = 3 =P1 1 2 1 2 }| 1 2 1 2 ¸{ =¸2 z · z}|{ + 7 =P2 1 2 ¡ 12 }| ¡ 12 Note that P1 and P2 are idempotent; that: · 1 1 ¸· 1 ¸ · ¸ 0 0 ¡ 12 2 P1 £ P2 = 12 21 = 1 0 0 ¡ 12 2 2 2 and that: A2 ¸2 · ¸ 5 ¡2 29 ¡20 = ¡2 5 ¡20 29 · 1 1 ¸ · 1 ¸ ¡ 12 2 = 32 12 21 + 72 1 ¡ 12 2 2 2 = · 1 2 ¸{ : CHAPTER 3. MATRIX ALGEBRA 179 which you can verify. Also: A¡1 ¸¡1 · 5 5 ¡2 = 21 2 ¡2 5 · 1 1 ¸ ·21 = 3¡1 12 21 + 7¡1 = · 2 2 2 21 5 21 1 2 ¡ 12 ¸ ¡ 12 1 2 ¸ which you can verify by multiplying A and A¡1 . 3.12 Positive Matrices 3.12.1 The Perron-Frobenius Theorem In many economic applications it often happens that the elements of a matrix are all positive in which case we have: De…nition 223 Positive Matrix: We say a matrix A = [aij ] is positive if aij > 0 for all i; j: Example: The matrix: A= · 1 2 3 4 ¸ is a positive matrix. Note that a positive matrix is not the same as a positive de…nite matrix. Positive matrices occur in many economic applications: for example with input-output matrices, which describe the technological interdependency of the di¤erent industries in an economy, and Markov chains which describe how probabilities vary over time. In these applications the eigenvectors and eigenvalues turn out to be quite important. For example with input-output matrices an eigenvector determines equilibrium prices or the balanced growth vector and the associated eigenvalue determines both the rate of pro…t and the growth rate of the economy. We can then show that all prices will be positive, that the equilibrium price vector is unique, and that the growth rate of the economy is maximized by appealing to the Perron-Frobenius theorem: Theorem 224 The Perron-Frobenius Theorem I If A = [aij ] is an n £ n positive square matrix then: ^>0. 1. A has a unique positive eigenvalue ¸ ^. 2. If ¸i is any other eigenvalue of A then j¸i j < ¸ CHAPTER 3. MATRIX ALGEBRA 180 ^ is a positive n £ 1 right-hand eigenvector x ^ = [^ xi ] (i.e., 3. Associated with ¸ yi ] (i.e., with y^i > 0 ^i > 0) and a positive left-hand eigenvector y^ = [^ with x ) which satisfy: ^ x; y^T A = ¸^ ^yT : A^ x = ¸^ These positive eigenvectors are unique up to a scalar multiple. 4. No other eigenvectors exist which have all positive elements. Remark: Note that we have assumed that for A = [aij ] that aij > 0 and so we have not allowed any aij = 0. This assumption can be relaxed considerably as long as A is remains indecomposable, which requires that An have all positive elements for some n. Example: Consider the 2 £ 2 matrix A given by: · ¸ 0:3 0:5 A= 0:2 0:7 which has all positive elements. You can verify that the eigenvalues of A are ^ = ¸1 = 0:87417; ¸2 = 0:12583; ¸ ^ that the associated right-hand eigenvectors are: that j¸2 j < ¸; · ¸ · ¸ 0:70753 ¡0:94435 x ^ = x1 = ; x2 = 0:81248 0:32895 and the associated left-hand eigenvectors are: £ ¤ £ ¤ y^T = y1T = 0:3544 1:0174 ; y2T = ¡0:754 0:65672 ^ and y^ have all positive elements, and the other eigenvectors do not and that x have all positive elements. 3.12.2 Markov Chains Suppose that workers can be in 1 of 2 states: either employed, state 1 or unemployed, state 2: Suppose further that the probability of a worker being employed next year depends only on whether he is employed or unemployed this year. First suppose the worker is employed this year. Let the probability of employment next year given that he is employed this year be p11 where 0 < p11 < 1 and let the probability of unemployment next year given that he is employed this year be: p12 = 1 ¡ p11 : Since he will be either employed or unemployed next year: p12 = 1 ¡ p11 with 0 < p12 < 1: Now suppose he is unemployed this year. Let the probability of unemployment next year given that he is unemployed this year be p22 where 0 < p22 < 1 and let the probability of employment next year given that he is unemployed CHAPTER 3. MATRIX ALGEBRA 181 this year be: p21 = 1 ¡ p22 Since he will be either employed or unemployed next year: p21 = 1 ¡ p22 with 0 < p21 < 1: We can put all these probabilities in a 2£2 matrix called P called a transition matrix as: · ¸ p11 p12 P = : p21 p22 Note that the rows of P sum to 1 since probabilities sum to 1 and all the elements of P are positive so later on we can apply the Perron-Frobenius theorem. Now suppose you want to know the probability that a worker employed today will be unemployed in 2 years from now, or for that matter n years from now. We have the following result: Theorem 225 Let pij (n) is the probability of being in state j in n periods given that the worker is in state i today. Then: · ¸n ¸ · p11 p12 p11 (n) p12 (n) n : =P = p21 p22 p21 (n) p22 (n) Example: Consider the transition matrix · ¸ 0:95 0:05 P = 0:4 0:6 so that someone employed today has a 95% chance of being employed next year, and someone unemployed today has a 60% chance of being unemployed next year. To calculate the corresponding probabilities for two years from now we calculate: · ¸· ¸ 0:95 0:05 0:95 0:05 2 = P 0:4 0:6 0:4 0:6 · ¸ 0:9225 0:0775 = : 0:62 0:38 Thus someone employed today has a probability of 92% probability of being employed in two years, and someone unemployed today has a 38% probability of being unemployed in 2 years. Now consider n = 10 years in the future. We have using the computer that: · ¸10 0:95 0:05 P 10 = 0:4 0:6 · ¸ · ¸ · ¸ 0:95 0:05 0:95 0:05 0:95 0:05 = £ £ ¢¢¢ £ 0:4 0:6 0:4 0:6 0:4 0:6 · ¸ 0:88917 0:11083 = 0:88664 0:1336 CHAPTER 3. MATRIX ALGEBRA 182 Thus someone employed today has a probability of 89% probability of being employed in 10 years, and someone unemployed today has a 13% probability of being unemployed in 10 years, and this is independent of whether you are employed or unemployed this year! If we let n get even larger for say n = 50 then this pattern becomes even more striking. Using the computer we have: P 50 = · ¸50 ¸ 0:889 0:111 0:889 0:111 · ¸ ¤ 1 £ 0:89 0:11 : = 1 = · 0:95 0:05 0:4 0:6 The probabilities 0:89 and 0:11 are the long-run probabilities (or equilibrium probabilities) of being employed and unemployed. Thus the long-run rate of unemployment for the work force would be 11%: It turns out that the vector of long-run probabilities: £ ¤ y = 0:89 0:11 is the left-hand eigenvector of P associated with the eigenvalue of 1; that is: yP = ¸y with ¸ = 1 or: · ¸ £ ¤ 0:95 0:05 £ ¤ 0:89 0:11 = 1 £ 0:89 0:11 : 0:4 0:6 The is part of a very general result. We have Theorem 226 If P = · p11 p21 p12 p22 ¸ is an transition matrix with positive elements and rows which sum to 1 then: · ¸ ¤ 1 £ n p 1¡p lim P = 1 n!1 where p and 1 ¡ p are the long-run probabilities of being in state 1 and state 2 and where 0 < p < 1: Proof. Since the rows of P sum to 1 it follows that if · ¸ 1 ¶= 1 CHAPTER 3. MATRIX ALGEBRA 183 then: · p11 p12 p21 p22 · ¸ 1 = =¶ 1 P¶ = ¸· 1 1 ¸ = · p11 + p12 p21 + p22 ¸ ^ = ¶ is the unique positive right-hand eigenvector of P corresponding so that x ^ = ¸1 = 1: By the Perron-Frobenius theorem we know that to the eigenvalue ¸ ^ = 1. We also know that the other eigenvalue is less that 1 so that : j¸2 j < ¸ there exists a corresponding left-hand eigenvector £ ¤ y^ = y1 y2 with y^1 > 0 and y^2 > 0: We can normalize y^ so the elements sum to 1 by 1 ^ as: dividing by y1 + y2 and setting p = y^1y^+^ y2 and so we can write y y^ = £ p 1¡p ¤ : Now from the spectral representation for P we have: P ^ xy^ + ¸2 x2 y2 = ¸^ = ¶^ y + ¸2 x2 y2 and: P n = ¶y + ¸n2 x2 y2 : ^ = 1 it follows that as n ! 1 that ¸n ! 0 so that: Since j¸2 j < ¸ 2 · ¸ · ¸ ¤ p 1¡p 1 £ n p 1¡p = : P ! ¶y = p 1¡p 1 Remark 1: There is actually no reason to limit ourselves to 2 states. For example a worker might conceivably be in say four states: 1 full-time employment, 2 part-time employment, 3 unemployment and 4 not being in the labour force. In this case the transition matrix is a 4 £ 4 matrix with positive entries and rows which sum to 1: For example: 2 3 0:9 0:05 0:04 0:01 6 0:4 0:5 0:06 0:04 7 7 P =6 4 0:2 0:3 0:4 0:1 5 0:05 0:05 0:2 0:7 so that someone unemployed today has a probability p32 = 0:3 of having parttime work next year. CHAPTER 3. MATRIX ALGEBRA 184 Remark 2: Problems can arise if some of the elements of P are 0: For example if: · ¸ 0:6 0:4 P = 0 1 then if state 2 is unemployment then an unemployed worker never …nds a job. In this case unemployment is an absorbing state and employment is a transitory state; all workers eventually become permanently unemployed. 3.12.3 General Equilibrium and Matrix Algebra One of the …rst things you learn in economics is the supply and demand model. This is known as partial equilibrium analysis since it abstracts from the way di¤erent markets interact with each other. In general equilibrium analysis on the other hand we explicitly treat the way di¤erent markets interact. General equilibrium analysis generally requires quite advanced mathematical techniques. For example it was only with the development in the last 60 years in mathematics of what are called …xed-point theorems that economists have been able to prove that a set of prices exists which will equate demand and supply in all markets in the economy. Here we will give you a taste of general equilibrium analysis for an economy with a Leontief technology and where technology determines prices independent of tastes. Thus consider an economy where there i = 1; 2; : : : n goods that are produced. Let aij be the amount of good j needed to produce 1 unit of good i: We can put the aij 0 s into an n £ n matrix A as: 2 3 a11 a12 ¢ ¢ ¢ a1n 6 a21 a22 ¢ ¢ ¢ a2n 7 6 7 A=6 . .. .. 7 : .. 4 .. . . . 5 an1 an2 ¢ ¢ ¢ ann The matrix A; referred to as an input-output matrix, captures the Leontief technology of this economy. Let pj be the price of good j: The cost of producing one unit of good i is given by: ci = ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn or if we de…ne the n £ 1 vector of costs as c = [ci ] and the n £ 1 vector of prices: p = [pj ] then in matrix form: c = Ap: The revenue from producing 1 unit of good i is just the price: pi so that pro…ts are given by: pi ¡ (ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn ) CHAPTER 3. MATRIX ALGEBRA 185 and the rate of pro…t in industry i is given by pro…ts divided by the costs so that: ¼i = pi ¡ (ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn ) : ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn Now in equilibrium the rate of pro…t must be the same in each industry, otherwise no production would take place in those industries with a lower rate of pro…t. Thus we require that ¼i = ¼ for all i so that: ¼= pi ¡ (ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn ) ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn or: pi = (ai1 p1 + ai2 p2 + ¢ ¢ ¢ + ain pn ) (1 + ¼) or in matrix notation: p = Ap (1 + ¼) or written slightly di¤erently: Ap = 1 p: 1+¼ Note this takes the same form as Ax = ¸x where x is an eigenvector and ¸ is an eigenvalue. From this it follows that x = p is an eigenvector of the matrix 1 A and that ¸ = 1+¼ is an eigenvalue of A: In general there will be n eigenvalues and eigenvectors so the question is which one is the appropriate one. Since p is a vector of prices and prices are all positive, we cannot accept eigenvectors with negative elements. From the Perron-Frobenius theorem we know that there is only one eigen^ with all positive elements and this corresponds to the eigenvalue vector p = x ^ > 0 which determines the rate of pro…t of the economy. We therefore have: ¸ Theorem 227 There exists a unique (up to a scalar multiple) positive price vector which is the positive right-hand eigenvector of A associated with the eigen^ value ¸: Theorem 228 The equilibrium rate of pro…t is given by: ¼= 1 ¡ 1: ^ ¸ Remark: If p = x ^ is an equilibrium vector of prices then so too is ®^ p where ® is any positive scalar. This non-uniqueness is a general feature of general equilibrium models and corresponds to the fact that agents only care about relative prices. Thus if ® = 2 and we double all prices in the economy this will have no a¤ect on rational economic decision making or equilibrium in the CHAPTER 3. MATRIX ALGEBRA 186 economy. Thus while p and 2p correspond to di¤erent nominal price vectors, real or relative prices are the same. ^ is a right-hand eigenvector. It turns out that the leftNote that p = x hand eigenvector y^ also has an interesting economic interpretation. Thus by ^ there exists a the Perron-Frobenius theorem we know that corresponding to ¸ T ^ y: It turns out that: unique positive eigenvector y^ which satis…es y^ A = ¸^ ^ Theorem 229 y^ determines the balanced growth path for the economy and ¸ the growth rate of the economy. Proof. Let yi be the amount of good i produced. Then the input requirement of good j will be: rj = y1 a1j + y2 a2j + ¢ ¢ ¢ + yn anj or if we de…ne the 1 £ n vector of production levels as y = [yi ] and the 1 £ n vector of input requirements as r = [rj ] then in matrix notation: r = y^A: If there is balanced growth so that there is no unemployment or shortages in the economy, then: y = (1 + ½) r where 1 + ½ is the growth rate of the economy. In matrix notation we then have: yA = 1 y: 1+½ Since we require that all elements of y be positive it follows that: y = y^ and ^= 1 : ¸ 1+½ ^ in addition to determining the pro…t rate also determines the growth Thus ¸ rate of the economy along the balanced growth path. Therefore: Theorem 230 With balanced growth the rate of growth of the economy and the rate of pro…t are identical and are given by: ¼=½= 1 ¡ 1: ^ ¸ Example: Suppose the economy has n = 2 sectors and · ¸ 0:3 0:65 A= : 0:2 0:72 ^ = 0:92725 and ¸2 = 0:0927: It follows The two eigenvalues of A are then: ¸1 = ¸ that the rate of pro…t ¼ and the growth rate of the economy ½ are identical: ^ = 0:92725 = ¸ 1 1 = 1+¼ 1+½ CHAPTER 3. MATRIX ALGEBRA 187 so that: ¼=½= 1 ¡ 1 = 0:0785 0:92725 and the pro…t and growth rates are both 7:85%: ^ determines prices and The positive right-hand eigenvector associated with ¸ is: · ¸ 0:81754 p=x ^= : 0:78893 Thus the relative price of good 1 and 2 will be p1 0:81754 = 1:036 = p2 0:78893 so if p2 = 1 (the second good is the numeraire) then the real price of good 1 is 1:036 units of good 2: The positive left-hand eigenvector of A determines the balanced growth path and is: · ¸ 0:34513 y^ = 1:0824 and so: 0:34513 y^1 = 0:319 = y^2 1:0824 so that if no resources are to be unemployed and there are to be no shortages, then along the balanced growth path 0:319 units of factor 1 will be employed for every unit of factor 2 employed. Chapter 4 Multivariate Calculus 4.1 Functions of Many Variables To treat variables as constants is the characteristic vice of the unmathematical economist. -Francis Edgeworth Functions of only one variable: y = f (x) can only take us so far. Usually when we write such functions we have in mind that there are other variables in the background that are kept constant. For example although we might write a demand function as: Q = Q (P ) we know this is wrong, that the quantity demanded depends not only on the own price: P , but also on the price of other goods P1 ; P2 ; : : : Pn (substitutes and complements) and on income Y: We should instead write a demand function as: Q = Q (P; P1 ; P2 ; : : : Pn ; Y ) : The same argument would apply equally well to almost anything else we consider in economics for the simple reason that economic variables are generally in‡uenced by many other variables and not just one. Thus we now change our focus to functions of the form: y = f (x1 ; x2 ; : : : xn ) and the calculus tools we need to work with these functions. Multivariate functions are generally hard to visualize since in order to graph them we need n + 1 dimensions: n dimensions for the xi 0 s and one dimension for the y: A function with n = 2 independent variables: y = f (x1 ; x2 ) requires a three dimensional graph, something which can be represented (with di¢culty) 188 CHAPTER 4. MULTIVARIATE CALCULUS 189 on a two-dimensional page. For functions with n ¸ 3 however we really cannot directly visualize a function. Following what we have learned in linear algebra, we can nevertheless know a lot about these functions analytically. For example we will be able to tell which functions are mountains, which are valleys and where the tops and bottoms of these valleys are. It is often tedious to explicitly write out all n of the xi 0 s: Instead we can put all of them in a n £ 1 row vector x as and write y = f (x1 ; x2 ; : : : xn ) more compactly as: y = f (x) : Note that this looks exactly like a function in univariate calculus but where x is now interpreted as an n £ 1 vector. Example: Consider a multivariate function with n = 2 as: 2 2 y = f (x) = f (x1 ; x2 ) = e¡ 2 (x1 +x2 ¡x1 x2 ) : 1 where x = [x1 ; x2 ]T is a 2£ 1 vector. This is a three-dimensional mountain as depicted below: y x2 x1 : ¡ 12 (x21 +x22 ¡x1 x2 ) f (x1 ; x2 ) = e where the vertical axis is y and the two dimensional plane has x1 on one axis and x2 on the other. If x1 = 2 and x2 = 1 we have: 2 y = f (2; 1) = e¡ 2 (2 1 +12 ¡2£1) = 0:22313 while if x1 = ¡ 12 and x2 = 23 then: µ ¶ ³ ´ 2 2 1 2 ¡ 12 (¡ 12 ) +( 23 ) ¡(¡ 12 )£( 23 ) = 0:598: y=f ¡ ; =e 2 3 4.2 Partial Derivatives Mathematics is a language. -Josiah Willard Gibbs CHAPTER 4. MULTIVARIATE CALCULUS 190 The cornerstone of multivariate calculus is the partial derivative. Given a function of n variables: y = f (x1 ; x2 ; : : : xn ) there will be n partial derivatives, one for each one of the xi 0 s: Calculating partial derivatives is really no more di¢cult than calculating an ordinary derivative in univariate calculus. We have: De…nition 231 Partial Derivative: Given the function y = f (x1 ; x2 ; : : : xn ) the partial derivative with respect to xi ; denoted by: @y @f (x1 ; x2 ; : : : xn ) ´ @xi @xi is the ordinary derivative of f (x1 ; x2 ; : : : xn ) with respect to xi obtained by treating all other xj 0 s (for j 6= i ) as constants. Remark 1: For ordinary derivatives we use d as in @y : we use the old German letter d: ‘@’ as in @x i dy dx while partial derivatives Remark 2: Another useful notation for a partial derivative is to write either xi or i as a subscript: @y ´ fi (x1 ; x2 ; : : : xn ) ´ fxi (x1 ; x2 ; : : : xn ) : @xi @y ; this notation emphasizes that the fact that like f (x1 ; x2 ; : : : xn ) the Unlike @x i partial derivative is also a multivariate function of x1 ; x2 ; : : : xn . Remark 3: A very bad notation often used by students is to write a partial derivative as: f 0 (x1 ; x2 ; : : : xn ) : The problem here is that a partial derivative is always with respect to a particular xi but this notation does not tell you which xi you are di¤erentiating with respect to. Thus if you write: f 0 (x1 ; x2 ) there is no way of knowing if you 1 ;x2 ) 1 ;x2 ) : Therefore: Do not use the notation: f 0 () for mean @f (x or @f (x @x1 @x2 partial derivatives. Example 1: To calculate the partial derivative of y = f (x1 ; x2 ) = x51 x72 with respect to x1 we treat x2 as a constant and di¤erentiate with respect to x1 to obtain: @f (x1 ; x2 ) @x1 @y @ ¡ 5 7¢ x x ´ @x1 @x1 1 2 = 5x41 x72 : ´ CHAPTER 4. MULTIVARIATE CALCULUS 191 Note that 5x41 x72 is a function of both x1 and x2 ; that is although in the 1 ;x2 ) x2 calculation we treated x2 as a constant, after we have calculated @f (x @x1 reverts to its former status as a variable, just like x1 : That is why we write @f (x1 ;x2 ) (x1 ) : and not @f@x @x1 1 To calculate the derivative of f (x1 ; x2 ) with respect to x2 treat x1 as a constant and di¤erentiate with respect to x2 to obtain: @f (x1 ; x2 ) @x2 @y @ ¡ 5 7¢ x x ´ @x2 @x1 1 2 = 7x51 x62 : ´ Example 2: Given: y = f (x1 ; x2 ) = 1 2 ln (x1 ) + ln (x2 ) ¡ x1 x2 3 3 1 ;x2 ) 1 ;x2 ) 1 ;x2 ) : To calculate @f (x there will be two partial derivatives: @f (x and @f (x @x1 @x2 @x1 we treat x2 as a constant and di¤erentiate with respect to x1 as: µ ¶ @ 2 1 @f (x1 ; x2 ) ln (x1 ) + ln (x2 ) ¡ x1 x2 = @x1 @x1 3 3 1 @ 2 @ @ = ¡ x2 ln (x1 ) + ln (x2 ) x1 3 @x1 3 @x @x1 | {z } | {z } | 1 {z } = x1 1 @f(x1 ;x2 ) @x2 =1 1 1 ¡ x2 : 3 x1 = To calculate to x2 as: =0 since x2 is a constant we treat x1 as a constant and di¤erentiate with respect @f (x1 ; x2 ) @x2 = = µ ¶ @ 2 1 ln (x1 ) + ln (x2 ) ¡ x1 x2 @x2 3 3 2 1 ¡ x1 : 3 x2 Example 3: Given: f (x1 ; x2 ; x3 ) = x23 x31 + 2 ln (x2 ) x21 CHAPTER 4. MULTIVARIATE CALCULUS 192 we have: @f (x1 ; x2 ; x3 ) @x1 = = @f (x1 ; x2 ; x3 ) @x2 = = @f (x1 ; x2 ; x3 ) @x3 = = 4.2.1 ¢ @ ¡ 2 3 x3 x1 + 2 ln (x2 ) x21 @x1 3x23 x21 + 4 ln (x2 ) x1 ; ¢ @ ¡ 2 3 x x + 2 ln (x2 ) x21 @x2 3 1 2x21 ; x2 ¢ @ ¡ 2 3 x3 x1 + 2 ln (x2 ) x21 @x3 2x31 x3 : The Gradient It is often tedious to write down all of the n partial derivatives of f (x1 ; x2 ; : : : xn ) : Just as we can write f (x1 ; x2 ; : : : xn ) as f (x) by letting x be an n£ 1 vector, we can use matrix algebra to obtain a more compact notion by putting each of the n partial derivatives into a n £ 1 vector, called the gradient. We have: De…nition 232 Gradient: Given the function y = f (x) where x is an n £ 1 vector, the gradient is an n £ 1 vector of partial derivatives denoted by: rf (x) (x) : or @f@x 2 6 6 @f (x) ´ rf (x) ´ 6 6 @x 4 @f (x1 ;x2 ;:::xn ) @x1 @f (x1 ;x2 ;:::xn ) @x2 .. . @f (x1 ;x2 ;:::xn ) @xn 3 7 7 7: 7 5 Example 1: Given: f (x1 ; x2 ) = 1 2 ln (x1 ) + ln (x2 ) ¡ x1 x2 3 3 the gradient is a 2 £ 1 vector given by: rf (x1 ; x2 ) = " @f (x1 ;x2 ) @x1 @f (x1 ;x2 ) @x2 # 2 =4 1 3x1 ¡ x2 2 3x2 ¡ x1 Example 2: Given: f (x1 ; x2 ; x3 ) = x23 x31 + 2 ln (x2 ) x21 3 5: CHAPTER 4. MULTIVARIATE CALCULUS 193 the gradient is a 3 £ 1 vector given by: 2 3x23 x21 + 4 ln (x2 ) x1 2x21 x2 rf (x1 ; x2 ; x3 ) = 4 2x3 x31 3 5: Imagine you are standing on a three-dimensional mountain y = f (x1 ; x2 ). Looking at the slope, you turn around until you are looking in the direction where the mountain is steepest or in the direction where the climbing would be the hardest. You are then looking in the same direction as the gradient rf (x1 ; x2 ). In general we have: Theorem 233 The gradient rf (x) points in the direction where the function f (x) is steepest. Example: In Example 1 above we calculated the gradient for the function: 1 2 1 2 3 ln (x1 ) + 3 ln (x2 ) ¡ x1 x2 : For x1 = 3 ; x2 = 3 the gradient is: 2 rf (1; 1) = 4 = · 1 3x1 ¡ x2 2 3x2 ¡ x1 1 3 2 3 ¸ 3 5 jx 1 2 1 = 3 ;x2 = 3 and so the function is steepest in the direction that the vector depicted below points: 0.6 0.5 0.4 x2 0.3 0.2 0.1 0 0.05 0.1 0.15 0.2 0.25 0.3 x1 CHAPTER 4. MULTIVARIATE CALCULUS 4.2.2 194 Interpreting Partial Derivatives A partial derivative is much like the result of a controlled experiment. Suppose for example you want to know how vitamin C a¤ects the life expectancy of rats. In a proper experiment you would try and hold all variables constant except vitamin C, vary the consumption of vitamin C and observe what happens to the rats’ life expectancy. If you see the rats with more vitamin C live longer (shorter), you then can conclude that there exists a positive (negative) relationship between vitamin C and the life expectancy of rats. Now instead of real rats suppose that we have a multivariate function y = f (x1 ; x2 ; : : : xn ) where y is life expectancy and xi is vitamin C consumption. Just as with real rats we want to know how xi a¤ects y. Instead of an experiment ;x2 ;:::xn ) we calculate the partial derivative @f(x1@x . Just as with the experiment we i hold all other variables constant except vitamin C when calculating a partial derivative. The sign of this partial derivative then tells us the nature of the relationship between xi and y: In particular: Theorem 234 Given y = f (x1 ; x2 ; : : : xn ) if: @f (x1 ; x2 ; : : : xn ) >0 @xi then y is an increasing function of xi ; that is increasing (decreasing) xi holding all other xj 0 s …xed will increase (decrease) y: Theorem 235 Given y = f (x1 ; x2 ; : : : xn ) if: @f (x1 ; x2 ; : : : xn ) <0 @xi then y is a decreasing function of xi ; that is increasing (decreasing) xi holding all other xi 0 s …xed will decrease (increase) y: Remark: As with univariate calculus, these properties can hold either locally > 0 for all x in the domain we would say that y is a or globally. If @f@x(x) i globally increasing function of xi : If a locally increasing function of xi : @f (x) @xi > 0 only at a point then y would be The partial derivative also gives us quantitative information about the relationship between xi and y; in particular it gives us the xi multiplier. Theorem 236 Given y = f (x1 ; x2 ; : : : xn ) if xi is changed by ¢xi with all other xj 0 s kept constant then the change in y is approximately given by: ¢y ¼ @f (x1 ; x2 ; : : : xn ) ¢xi @xi where the approximation gets better the smaller is ¢xi : CHAPTER 4. MULTIVARIATE CALCULUS 195 Example 1: Consider: y = f (x1 ; x2 ) = 1 2 ln (x1 ) + ln (x2 ) ¡ x1 x2 3 3 where: 1 @f (x1 ; x2 ) ¡ x2 : = @x1 3x1 At x1 = 1 12 and x2 = 1 2 we have ¡ 1 1¢ @f 12 ;2 1 7 1 = ¡1¢ ¡ = >0 @x1 2 2 3 12 ¡ 1 1¢ ; 2 or y is a so that a positive relationship exists between x1 and y at 12 locally increasing function of x1 : 1 1 For example if we increase x1 a small amount from x1 = 12 to say x1 = 10 so that 1 1 ¡ = 0:017; ¢x1 = 10 12 ¡ 1 1¢ ;2 = at 12 , then y will increase from y = f 12 and if we keep x2¡ constant ¢ 1 1 ¡1:332 1 to y = f 10 ; 2 = ¡1:2796 or by: ¢y = ¡1:2796 ¡ (¡1:3321) = 0:0525 ¡ 1 1¢ @f 12 ;2 ¼ ¢x1 @x1 7 £ 0:017 = 2 = 0:0595: Now focusing on the relationship between x2 and y we have: @f (x1 ; x2 ) 2 = ¡ x1 @x2 3x2 so that: @f ¡ 1 1 12 ; 2 @x2 ¢ = 1 5 2 1 ¡ ¢¡ = >0 3 12 12 4 so that y is a locally increasing function of x2 : Thus if we increase x2 a small 1 , then y will increase. amount from x1 = 12 , keeping x1 constant at 12 1 On the other hand at x1 = 2 when x2 = 2 we have: ¡ ¢ @f 2; 12 1 1 1 ¡ =¡ <0 = @x1 3 (2) 2 3 ¡ 1¢ @f 2; 2 2 2 ¡1¢ ¡ 2 = ¡ < 0 = @x2 3 3 2 CHAPTER 4. MULTIVARIATE CALCULUS 196 and so¡ it follows that f (x1 ; x2 ) is a locally decreasing function of both x1 and ¢ x2 at 2; 12 : Example 2: Consider the function: y = f (x1 ; x2 ) = e2x1 ¡3x2 : We have: @f (x1 ; x2 ) = 2e2x1 ¡3x2 > 0 @x1 for all x1 ; x2 and so y is a globally increasing function of x1 : Similarly: @f (x1 ; x2 ) = ¡3e2x1 ¡3x2 < 0 @x2 for all x1 ; x2 and so y is a globally decreasing function of x2 : 4.2.3 The Economic Language of Partial Derivatives Consider a demand curve: Q = Q (P; P1 ; P2 ; : : : Pn ; Y ) where P is the own price, the price of other related goods are P1 ; P2 ; : : : Pn income is Y . Now suppose we want to say that the demand curve is downward sloping. What does that mean? In introductory economics we would say that a demand curve is downward sloping if increasing P while holding P1 ; P2 ; : : : Pn and Y results in Q going down. Notice how long it takes to say this! We can be much more concise if we use mathematics and say simply, a demand curve slopes downward if: @Q < 0: @P One of the reasons this is more concise is that instead of saying “all other variables are held constant” we merely write @ instead of d to express this idea. @Q > 0: Similarly if we want to say that the good is normal, we simply write: @Y This then replaces the introductory de…nition which states that a good is normal if increasing Y holding P; P1 ; P2 ; : : : Pn constant causes Q to increase. If we want @Q < 0; if we want to say that good i is to say that a good is inferior we write: @Y @Q a substitute we write: @Pi > 0; if we want to say that good j is a complement @Q < 0. we write: @P j These ideas apply equally well to supply curves, cost functions or just about anything else one considers in economics. Thus much of the informal language one uses in introductory economics is reformulated in terms of partial derivatives in more advanced economics, and this allows us to state ideas much more concisely. CHAPTER 4. MULTIVARIATE CALCULUS 197 Example: Consider a demand curve for co¤ee: Q = P ¡2 P13 P2¡3 Y 2 where P is the price of co¤ee, P1 is the price of tea, P2 is the price of sugar and Y is income. Then: @Q @P @Q @P1 @Q @P2 @Q @Y 4.2.4 = ¡2P ¡3 P13 P2¡3 Y 2 < 0 =) the co¤ee demand curve slopes downward = 3P ¡2 P12 P2¡3 Y 2 > 0 =) co¤ee and tea are substitutes = ¡3P ¡2 P13 P2¡4 Y 2 < 0 =) co¤ee and sugar are complements = 2P ¡2 P13 P2¡3 Y 1 < 0 =) co¤ee is a normal good. The Use of the Word Marginal In introductory economics the marginal product of labour is de…ned as the contribution of the last worker hired to output. In univariate calculus we de…ned the marginal product of labour as the derivative of the short-run production function Q = f (L) with respect to L: The short-run production function is obtained from the production function Q = F (L; K) by holding K constant. Since K was held …xed this ordinary derivative was actually a partial derivative. Thus the precise de…nition of the marginal product of labour is as the partial derivative: MPL (L; K) ´ @F (L; K) : @L In general then when economists use the word ‘marginal’ as in marginal utility, marginal product of labour or the marginal product of capital they are referring to a partial derivative where all other variables are held constant. Thus: De…nition 237 When in economics we refer to a ‘marginal’ concept we mean a partial derivative. Example 1: Consider a production function: Q = F (L; K) where Q is output, L is labour and K is capital. The marginal product of labour is the partial derivative: M PL (L; K) ´ @F (L; K) @L CHAPTER 4. MULTIVARIATE CALCULUS 198 while the marginal product of capital is: MPK (L; K) ´ @F (L; K) : @K Thus for the Cobb-Douglas production function: 1 1 Q = L2 K 4 the marginal products of labour and capital are given by: M PL (L; K) = M PK (L; K) = 1 ¡1 1 L 2 K 4 > 0; 2 1 1 ¡3 L 2 K 4 > 0: 4 The fact that the marginal products are positive means that Q is a globally increasing function of L and a globally increasing function of K; that is labour and capital are productive. Example 2: Consider a household which gets utility from two goods Q1 and Q2 as: U = U (Q1 ; Q2 ) : The marginal utility of good 1 is: MU1 (Q1 ; Q2 ) ´ @U (Q1 ; Q2 ) @Q1 while the marginal utility of good 2 is: M U1 (Q1 ; Q2 ) ´ @U (Q1 ; Q2 ) : @Q2 For the Cobb-Douglas utility function: 1 2 U (Q1 ; Q2 ) = Q13 Q23 the marginal utilities of Q1 and Q2 are given by: M U1 (Q1 ; Q2 ) = M U2 (Q1 ; Q2 ) = 1 ¡ 23 23 Q Q2 > 0; 3 1 2 13 ¡ 13 Q Q > 0: 3 1 2 The fact that the marginal utilities are positive means that utility is a globally increasing function of both Q1 and Q2 , in other words both Q1 and Q2 are ‘goods’ and not ‘bads’. CHAPTER 4. MULTIVARIATE CALCULUS 4.2.5 199 Elasticities Instead partial derivatives we often prefer to talk about elasticities since these are free of units of measurement. Again elasticities are de…ned under the assumption that all other variables but one are held …xed. Thus for multivariate functions we de…ne an elasticity as: De…nition 238 Elasticity: Given y = f (x1 ; x2 ; : : : xn ) the elasticity with respect to xi is: ´i (x1 ; x2 ; : : : xn ) ´ ´ @y xi @xi y @f (x1 ; x2 ; : : : xn ) xi : @xi f (x1 ; x2 ; : : : xn ) In general elasticities change as x1 ; x2 ; : : : xn change. The functional form which has the property that the elasticities do not depend on x1 ; x2 ; : : : xn is the multivariate generalization of y = Axb given below: Theorem 239 If: y = f (x1 ; x2 ; : : : xn ) = Axb11 xb22 £ ¢ ¢ ¢ £ xbnn then all elasticities are independent of x1 ; x2 ; : : : xn and: ´ i = bi : Example: Consider again the demand curve for co¤ee: Qd = P ¡2 P13 P2¡3 Y 2 Note that the demand function has the functional form Axb11 xb22 £ ¢ ¢ ¢ and so the elasticities are simply the exponents on each variable. We therefore have: ´P = @Qd P ¡2P ¡3 P13 P2¡3 Y 2 £ P = = ¡2 d @P Q P ¡2 P13 P2¡3 Y 2 ´P1 = @Qd P1 3P ¡2 P12 P2¡3 Y 2 £ P1 = =3 @P1 Qd P ¡2 P13 P2¡3 Y 2 ´P2 = @Qd P2 ¡3P ¡2 P13 P2¡4 Y 2 £ P2 = = ¡3 @P2 Qd P ¡2 P13 P2¡3 Y 2 ´Y = @Qd Y 2P ¡2 P13 P2¡3 Y 1 £ Y = = 2: @Y Qd P ¡2 P13 P2¡3 Y 2 Thus a 1% increase in P leads to a 2% fall Q (demand is elastic), a 1% increase in P1 leads to a 3% increase in Q (co¤ee and tea are substitutes), a 1% increase in P2 leads to a 3% decrease Q (co¤ee and sugar are complements), and a 1% increase in Y leads to a 2% increase in Q (co¤ee is a normal good). CHAPTER 4. MULTIVARIATE CALCULUS 4.2.6 200 The Chain Rule We need the chain rule whenever we are working with functions of functions. Consider then the following situation. We have an outside function: y = f (x1 ; x2 ; : : : xn ) and n inside functions: g1 (w) ; g2 (w) ; : : : gn (w) where w is a scalar. We replace each xi with the inside function gi (w) to obtain: h (w) = f (g1 (w) ; g2 (w) ; : : : gn (w)) : The multivariate chain rule then tells us how to calculate h0 (w) : Theorem 240 Multivariate Chain Rule: If h (w) = f (g1 (w) ; g2 (w) ; : : : gn (w)) then: h0 (w) = @f (g1 (w) ; g2 (w) ; : : : gn (w)) 0 g1 (w) @x1 @f (g1 (w) ; g2 (w) ; : : : gn (w)) 0 g2 (w) + @x2 @f (g1 (w) ; g2 (w) ; : : : gn (w)) 0 gn (w) : +¢¢¢ + @xn Remark 1: Think of a multi-national oil company that has n subsidiaries in n di¤erent countries. Suppose that w is the price of oil, that the inside function: gi (w) is the before-tax pro…ts in country i in the local currency and the outside function f (x1 ; x2 ; : : : xn ) converts the pro…ts in each country’s local currency into say US dollars. Thus h (w) gives total pro…ts in US dollars as a function of the price of oil and h0 (w) indicates how pro…ts change as the price of oil changes. The terms in the chain rule take the form: @f (g1 (w) ; g2 (w) ; : : : gn (w)) 0 gi (w) : @xi Here gi0 (w) tells you how pro…ts in country i change as the price of oil changes while the multiplier: @f (g1 (w) ; g2 (w) ; : : : gn (w)) @xi indicates how a change in local currency pro…ts a¤ect aggregate U S dollar profits. The total e¤ect of a change in the price of oil h0 (w) is then the sum of these e¤ects of the n subsidiaries, as indicated by the chain rule. CHAPTER 4. MULTIVARIATE CALCULUS 201 Remark 2: Although the multivariate chain rule might look complicated, it is exactly the same idea as in univariate calculus where one starts by taking the derivative of the outside function, replacing the x with the inside function and then multiplying by the derivative of the inside function. The only di¤erence is 0 that now there are now n xi 0 s and n inside functions: the gi (w) s . One thus has to take the partial derivative of the outside function with respect to each xi , one must replace each xi with the n inside functions, multiply this by gi0 (w) and add up all n terms. A recipe for this goes as follows: A Recipe for the Multivariate Chain Rule Starting with x1 : 1. Take the partial derivative of the outside function f (x1 ; x2 ; : : : xn ) with ;x2 ;:::xn ) : respect to xi : @f (x1@x i @f (x1 ;x2 ;:::xn ) with @xi @f (g1 (w);g2 (w);:::gn (w)) @xi 2. Replace every occurrence of xi in inside function gi (w) to obtain: 3. Multiply the result in 2) by gi0 (w) : to obtain: the corresponding @f (g1 (w);g2 (w);:::gn (w)) @xi gi0 (w) 4. Repeat steps 1 to 3 for all xi : 5. Add up the results of 1 to 4 together to obtain h0 (w). Example 1: Consider a function with two x0 s : f (x1 ; x2 ) = 1 2 ln (x1 ) + ln (x2 ) ¡ x1 ¡ x2 3 3 and let g1 (w) = w2 and g2 (w) = ew be the two inside functions. If we replace every occurrence of x1 with w2 and every occurrence of x2 with ew we obtain: h (w) = f (g1 (w) ; g2 (w)) ¢ ¡ = f w2 ; ew 1 ¡ 2¢ 2 = ln w + ln (ew ) ¡ w2 ¡ ew 3 3 2 2 ln (w) + w ¡ w2 ¡ ew : = 3 3 We can then calculate h0 (w) directly as: h0 (w) = 2 2 + ¡ 2w ¡ ew : 3w 3 Now let us now use the recipe for the multivariate chain rule to calculate h0 (w) : Following the recipe we have: CHAPTER 4. MULTIVARIATE CALCULUS 202 1. The partial derivative of the outside function with respect to x1 is: 1 @f (x1 ; x2 ) ¡ 1: = @x1 3x1 2. Replacing x1 with w2 and x2 with ew in 1 results in: ¡ ¢ @f w2 ; ew 1 ¡ 1: = @x1 3w2 3. Since g1 (w) = w2 we have: g10 (w) = 2w and so multiplying 2 with g10 (w) yields: ¡ ¢ ¶ µ @f w2 ; ew 0 1 ¡ 1 £ 2w: g1 (w) = @x1 3w2 4. Repeat steps 1 to 3 with respect to x2 : Thus with x2 we have: 2 @f (x1 ; x2 ) ¡1 = @x2 3x2 and so replacing x1 and x2 by w2 and ew : ¢ ¡ @f w2 ; ew 2 = w ¡1 @x2 3e and multiplying by g20 (w) = ew we get: µ ¶ 2 ¡ 1 £ ew : 3ew 5. Adding up the results from 1 to 4 yields: µ ¶ µ ¶ 1 2 0 ¡ 1 £ 2w + ¡ 1 £ ew h (w) = 3w2 3ew 2 2 + ¡ 2w ¡ ew = 3w 3 and so we get the same answer as the direct calculation. Example 2: Proving the Product Rule Using the Chain Rule. Suppose the outside function is p (x1 ; x2 ) = x1 x2 so that p (x1 ; x2 ) simply multiplies x1 and x2 together. The two inside functions are any two univariate functions: f (x) and g (x) where x here is a scalar so that: h (x) = p (f (x) ; g (x)) = f (x) g (x) CHAPTER 4. MULTIVARIATE CALCULUS 203 and so h (x) is just the product of two functions f (x) and g (x) : To …nd h0 (x) we use the multivariate chain rule. We have: @p (x1 ; x2 ) @p (x1 ; x2 ) = x2 and = x1 @x1 @x2 so that: @p (f (x) ; g (x)) 0 @p (f (x) ; g (x)) 0 f (x) + g (x) @x1 @x2 = g (x) f 0 (x) + f (x) g 0 (x) h0 (x) = which is the product rule in univariate calculus. 4.2.7 A More General Multivariate Chain Rule The chain rule can be generalized to the case where the inside functions are multivariate functions. Although this is more general, if you understand the previous chain rule there are really no new ideas involved except that some things which were ordinary derivatives become partial derivatives. Theorem 241 Multivariate Chain Rule: If w in gi (w) is an m £ 1 vector as: gi (w) = gi (w1 ; w2 ; : : : wm ) and if h (w1 ; w2 ; : : : wm ) = f (g1 (w) ; g2 (w) ; : : : gn (w)) then: @h (w1 ; w2 ; : : : wm ) @wj 4.2.8 = @f (g1 (w) ; g2 (w) ; : : : gn (w)) @g1 (w1 ; w2 ; : : : wm ) @x1 @wj @f (g1 (w) ; g2 (w) ; : : : gn (w)) @g2 (w1 ; w2 ; : : : wm ) + @x2 @wj @f (g1 (w) ; g2 (w) ; : : : gn (w)) @gn (w1 ; w2 ; : : : wm ) +¢¢¢ + : @xn @wj Homogeneous Functions In agriculture, the state of the art being given, doubling the labour does not double the produce. -John Stuart Mill In economics one encounters many functions which are homogeneous. Demand and supply curves are always homogenous of degree 0; the marginal utility of income is always homogenous of degree ¡1 and cost functions are always homogeneous of degree 1: The homogeneity of a production function determines whether there are decreasing (small is beautiful), constant or increasing returns to scale (bigger is better). Homogeneity is de…ned as follows: CHAPTER 4. MULTIVARIATE CALCULUS 204 De…nition 242 A function f (x1 ; x2 ; : : : xn ) is said to be homogeneous of degree k if and only if for any ¸ > 0 : f (¸x1 ; ¸x2 ; : : : ¸xn ) = ¸k f (x1 ; x2 ; : : : xn ) : Remark: To prove that a given function is homogeneous of some degree, one begins with: f (¸x1 ; ¸x2 ; : : : ¸xn ) and through a series of derivations one tries to obtain: ¸k f (x1 ; x2 ; : : : xn ) : The exponent on ¸ then gives the degree of the homogeneity. Example 1: The Cobb-Douglas production function Q = F (L; K) = AL® K ¯ is homogeneous of degree k = ® + ¯; the sum of the exponents on capital and labour, since: A (¸L)® (¸K)¯ A¸® L® ¸¯ K ¯ ¸®+¯ AL® K ¯ ¸®+¯ F (L; K) : F (¸L; ¸K) = = = = 1 1 Thus Q = L 2 K 4 is homogeneous of degree k = 1 2 + 1 4 = 34 : Example 2: The Constant Elasticity of Substitution or CES production function: ° Q = F (L; K) = (®L½ + (1 ¡ ®) K ½ ) ½ is homogeneous of degree ° since: ° F (¸L; ¸K) = (® (¸L)½ + (1 ¡ ®) (¸K)½ ) ½ ° = (¸½ (®L½ + (1 ¡ ®) K ½ )) ½ ° ° = (¸½ ) ½ (®L½ + (1 ¡ ®) K ½ ) ½ ° = ¸° (®L½ + (1 ¡ ®) K ½ ) ½ = ¸° F (L; K) : For example if ® = 12 ; ½ = ¡1 and ° = 1 then the CES production function: ¢¡1 ¡ Q = 12 L¡1 + 12 K ¡1 is homogenous of degree 1: An important calculus property of homogeneous functions is Euler’s theorem: CHAPTER 4. MULTIVARIATE CALCULUS 205 Theorem 243 Euler’s Theorem. If f (x1 ; x2 ; : : : xn ) is homogeneous of degree k then: kf (x1 ; x2 ; : : : xn ) = @f (x1 ; x2 ; : : : xn ) @f (x1 ; x2 ; : : : xn ) @f (x1 ; x2 ; : : : xn ) x1 + x2 + ¢ ¢ ¢ + xn : @x1 @x2 @xn Proof. Let: h (¸) = f (¸x1 ; ¸x2 ; : : : ¸xn ) = ¸k f (x1 ; x2 ; : : : xn ) : Using the multivariate chain rule on f (¸x) and the fact that …nd that: d¸k d¸ = k¸k¡1 we @f (¸x1 ; ¸x2 ; : : : ¸xn ) @f (¸x1 ; ¸x2 ; : : : ¸xn ) x1 + ¢ ¢ ¢ + xn @x1 @xn = k¸k¡1 f (x1 ; x2 ; : : : xn ) : h0 (¸) = Now set ¸ = 1 and the result follows. Example 1: We have seen that: 1 1 F (L; K) = L 2 K 4 is homogeneous of degree k = = = = = = 1 2 + 1 4 = 34 : To verify Euler’s theorem note that: @F (L; K) @F (L; K) £L+ £K @L µ ¶ µ@K ¶ @ 1 1 @ 12 14 L2 K 4 £ L + L K £K @L @K 1 1 1 1 1 1 £ L 2 ¡1 K 4 £ L + £ L 2 K 4 ¡1 £ K 2 4 1 1 1 1 1 1 £ L2 K 4 + £ L2 K 4 2 4 1 1 3 £ L2 K 4 4 3 £ F (L; K) : 4 Example 2: Euler’s theorem allows us to make predictions about a competitive …rm’s pro…ts. Suppose Q = F (L; K) is homogeneous of degree k: Then by Euler’s theorem: kQ = @F (L; K) @F (L; K) L+ K: @L @K CHAPTER 4. MULTIVARIATE CALCULUS 206 A perfectly competitive …rm pro…t maximizes pro…ts by setting: (L;K) =R and @F@K P so that: kQ = @F (L;K) @L = W P R W L + K =) kP Q = W L + RK P P and so pro…ts ¼ are given by: ¼ = P Q ¡ (W L + RK) = P Q ¡ kP Q = (1 ¡ k) P Q: Thus if 0 < k < 1 (there are decreasing returns to scale) then ¼ > 0 while if k = 1 (constant returns to scale) then ¼ = 0: If k > 1 then pro…ts must be negative, which is indicative of the fact that increasing returns to scale are not consistent with perfect competition. Another useful calculus result for homogeneous functions is: Theorem 244 If f (x1 ; x2 ; : : : xn ) is homogeneous of degree k then @f (x1 ; x2 ; : : : xn ) @xi is homogeneous of degree k ¡ 1: 1 1 Example: While Q = L 2 K 4 is homogeneous of degree k = product of labour: 3 4 the marginal @ ³ 1 1 ´ 1 ¡1 1 L2 K 4 = L 2 K 4 @L 2 is homogeneous of degree k ¡ 1 = 4.2.9 3 4 ¡ 1 = ¡ 14 . Homogeneity and the Absence of Money Illusion Consider a demand function: Q = Q (P; P1 ; P2 ; : : : Pn ; Y ) where P is the own price, P1 P2 ; : : : Pn are the prices of related goods and Y is nominal income. Now suppose there is a general in‡ation so that all prices and incomes increase by the same proportion ¸. For example suppose ¸ = 2 so that all prices and incomes double. This means that real income and all real prices have stayed the same and so a rational household, that is one that does not su¤er from money illusion, will not change any of its real behavior and so Q remains the same. CHAPTER 4. MULTIVARIATE CALCULUS 207 Mathematically this means that: Q (¸P; ¸P1 ; ¸P2 ; : : : ¸Pn ; ¸Y ) = Q (P; P1 ; P2 ; : : : Pn ; Y ) = ¸0 Q (P; P1 ; P2 ; : : : Pn ; Y ) : Thus the absence of money illusion is equivalent to the demand function being homogeneous of degree k = 0: This logic also applies to supply curves as well as many other functions in economics. 4.2.10 Homogeneity and the Nature of Technology In economics the nature of technology is often critical. What is often critical is what happens as the scale of production is increased; is bigger better or is small beautiful? This can be captured by the degree of homogeneity of the production function. Suppose that a production function Q = F (L; K) is homogeneous of degree k: If we double the size of operation, so that ¸ = 2 then: F (2L; 2K) = 2k F (L; K) : This says that doubling the scale of operation causes output to increase by a factor of 2k : Now: 1 3 1. If k > 1 (e.g. F (L; K) = L 2 K 4 and k = 54 ) then doubling the scale leads to more than twice the output since then 2k > 1: We then say that the technology exhibits increasing returns to scale. Bigger is better. 1 1 2. If k = 1 (e.g. F (L; K) = L 2 K 2 and k = 1) then doubling the scale leads to exactly twice the output since then 21 = 2: We then say that the technology exhibits constant returns to scale. 1 1 3. If k < 1 (e.g. F (L; K) = L 2 K 4 and k = 34 ) then doubling the scale leads to less than twice the output since then 2k < 2: We then say that the technology exhibits decreasing returns to scale. Small is beautiful. 4.3 Second-Order Partial Derivatives We are going to be interested in second-order partial derivatives when we discuss the concavity, convexity and second-order conditions of multivariate functions. We have: De…nition 245 Given y = f (x1 ; x2 ; : : : xn ) the second-order partial derivative with respect to xi and xj is: µ ¶ @ @f (x1 ; x2 ; : : : xn ) @ 2 f (x1 ; x2 ; : : : xn ) = : @xj @xi @xj @xi CHAPTER 4. MULTIVARIATE CALCULUS 208 Remark 1: If there are n xi 0 s then there are n …rst-order partial derivatives and n2 second-order partial derivatives. For example the function y = f (x1 ; x2 ) has 2 …rst-order partial derivatives but 22 = 4 second-order partial derivatives: @ 2 f (x1 ; x2 ) @ 2 f (x1 ; x2 ) @ 2 f (x1 ; x2 ) @ 2 f (x1 ; x2 ) ; ; ; : @x1 @x1 @x1 @x2 @x2 @x1 @x2 @x2 Remark 2: The notation is usually a little di¤erent when we di¤erentiate twice with respect to the same xi in which case we typically (but not always) write: @ 2 f (x1 ; x2 ; : : : xn ) @x2i and not @ 2 f (x1 ; x2 ; : : : xn ) ; @xi @xi that is instead of @xi @xi in the denominator one writes @x2i : Example: Consider: 1 2 ln (x1 ) + ln (x2 ) ¡ x1 x2 : 3 3 We have 4 second-order partial derivatives: µ ¶ @ @f (x1 ; x2 ) @ 2 f (x1 ; x2 ) = @x21 @x1 @x1 µ ¶ @ 1 1 ¡ x2 = ¡ 2 = @x1 3x1 3x1 µ ¶ @ @f (x1 ; x2 ) @ 2 f (x1 ; x2 ) = @x2 @x1 @x2 @x1 µ ¶ @ 1 ¡ x2 = ¡1 = @x2 3x1 µ ¶ @ @f (x1 ; x2 ) @ 2 f (x1 ; x2 ) = @x1 @x2 @x1 @x2 µ ¶ 2 @ ¡ x1 = ¡1 = @x1 3x2 µ ¶ @ @f (x1 ; x2 ) @ 2 f (x1 ; x2 ) = @x22 @x2 @x2 µ ¶ @ 2 2 ¡ x1 = ¡ 2 : = @x2 3x2 3x2 y = f (x1 ; x2 ) = Note that in this example @ 2 f (x1 ; x2 ) @ 2 f (x1 ; x2 ) = ; @x1 @x2 @x2 @x1 CHAPTER 4. MULTIVARIATE CALCULUS 209 that is we get the same result if we …rst di¤erentiate with respect to x1 and then with respect to x2 as when we …rst di¤erentiate with respect to x2 and then with respect to x1 : Alternatively we get the same result if we …rst apply @ @ @ @ @x1 and then @x2 or if we …rst apply @x2 and then @x1 : This turns is not a coincidence but is true for all functions. This very useful result is known as: Young’s theorem. Theorem 246 Young’s Theorem: Given y = f (x1 ; x2 ; : : : xn ) di¤erentiating …rst with respect to xi and then with respect to xj gives the same result as di¤erentiating …rst with respect to xj and then with respect to xi so that: µ ¶ µ ¶ @ @f (x1 ; x2 ; : : : xn ) @ @f (x1 ; x2 ; : : : xn ) = @xi @xj @xj @xi or: @ 2 f (x1 ; x2 ; : : : xn ) @ 2 f (x1 ; x2 ; : : : xn ) = : @xi @xj @xj @xi Example: Given: 2 2 y = f (x1 ; x2 ) = e¡ 2 (x1 +x2 ) 1 if we …rst di¤erentiate with respect to x1 we have : ´ 2 2 2 2 2 2 1 1 1 @ 2 f (x1 ; x2 ) @ ³ @f (x1 ; x2 ) ¡x1 e¡ 2 (x1 +x2 ) = x1 x2 e¡ 2 (x1 +x2 ) = ¡x1 e¡ 2 (x1 +x2 ) =) = @x1 @x2 @x1 @x2 while if we …rst di¤erentiate with respect to x2 we have : ´ 2 2 2 2 2 2 1 1 1 @ 2 f (x1 ; x2 ) @ ³ @f (x1 ; x2 ) ¡x2 e¡ 2 (x1 +x2 ) = x1 x2 e¡ 2 (x1 +x2 ) : = ¡x2 e¡ 2 (x1 +x2 ) =) = @x2 @x1 @x2 @x1 Both yield the same result, as required by Young’s theorem. 4.3.1 The Hessian A multivariate function y = f (x1 ; x2 ; : : : xn ) has a large number of second-order partial derivatives. The best way to organize these n2 second derivatives is to put them into an n £ n matrix called the Hessian, as de…ned below: De…nition 247 Hessian: Given y = f (x1 ; x2 ; : : : xn ) = f (x) n £ 1 vector the Hessian is: 2 @ 2 f (x) @ 2 f (x) @ 2 f (x) ¢ ¢ ¢ @x @x1 @x2 @x21 1 @xn 6 @ 2 f (x) @ 2 f (x) @ 2 f (x) 6 ¢ ¢ ¢ 6 @x2 @x1 @x2 @xn @x22 H (x1 ; x2 ; : : : xn ) = 6 .. .. .. .. 6 . . . . 4 @ 2 f (x) @ 2 f (x) @ 2 f (x) ¢¢¢ @xn @x1 @xn @x2 @x2 n where x is an 3 7 7 7 7: 7 5 CHAPTER 4. MULTIVARIATE CALCULUS 210 Note that by Young’s theorem @ 2 f (x) @ 2 f (x) = @xi @xj @xj @xi and so the elements above the diagonal of the Hessian are equal to the corresponding elements below the diagonal and so: Theorem 248 Matrix Version of Young’s Theorem: The Hessian: H (x1 ; x2 ; : : : xn ) is a symmetric matrix or: H (x1 ; x2 ; : : : xn ) = H (x1 ; x2 ; : : : xn )T : Remark: Young’s theorem reduces the number of di¤erent second-order partial derivatives that we need to calculate from the n2 elements in the Hessian to the n(n+1) elements on and above the diagonal. For example if n = 4 rather 2 = 10 than calculating 42 = 16 second derivatives we need only calculate: 4(4+1) 2 di¤erent second derivatives. Calculating the Elements of the Hessian If you have trouble remembering how to construct the Hessian write down a blank square matrix and along the top and left-side of the matrix make a list of the xi 0 s as follows: x1 x2 .. . xn x21 6 6 6 .. 4 . x2 ¢¢¢ ¢¢¢ ¢¢¢ x3n 7 7 7 5 : To …ll in i; j th entry read the corresponding xi to the left and xj above and di¤erentiate with respect to these two variables. Example 1: For the function: f (x1 ; x2 ) = 1 2 ln (x1 ) + ln (x2 ) ¡ x1 x2 3 3 the second derivatives are: @ 2 f (x1 ; x2 ) @x21 @ 2 f (x1 ; x2 ) @x22 1 @ 2 f (x1 ; x2 ) @ 2 f (x1 ; x2 ) ; = = ¡1; 2 3x1 @x2 @x1 @x1 @x2 2 = ¡ 2: 3x2 = ¡ To calculate the Hessian: H (x1 ; x2 ) one writes : CHAPTER 4. MULTIVARIATE CALCULUS ·x1 x1 x2 x¸2 ? 211 : Thus for the 1; 2 element where the ? is placed, we di¤erentiate …rst with respect to x1 (o¤ the left side of the 1; 2 element), and then with respect to x2 (directly above the 1; 2 element). This then is @ 2 f (x1 ; x2 ) = ¡1: @x1 @x2 By Young’s theorem the 2; 1 and 1; 2 elements are identical and so we obtain: x1 x2 · x1 x2 ¸ ? ¡1 : ¡1 To obtain the 1; 1 element where the ? is now placed, we di¤erentiate …rst with respect to x1 ; reading left, and then again with respect to x1 reading above which is: 1 @ 2 f (x1 ; x2 ) =¡ 2 2 @x1 3x1 and so we now have: x1 x2 · x1 x2 ¸ ¡ 3x12 ¡1 1 : ¡1 ? To …nish our calculation we calculate the 2; 2 element where the ? is placed by di¤erentiating twice with respect to x2 as: 2 @ 2 f (x1 ; x2 ) =¡ 2 @x22 3x2 and so the Hessian is given by: H (x1 ; x2 ) = " ¡ 3x12 1 ¡1 ¡1 ¡ 3x22 2 # : Example 2: For the Cobb-Douglas production function: 1 1 Q = F (L; K) = L 2 K 4 CHAPTER 4. MULTIVARIATE CALCULUS 212 we calculate the Hessian by …rst listing L and K along the top and left-side of a 2 £ 2 matrix as: L K · L K¸ ? : Thus for the 1; 2 element where the ? is placed we di¤erentiate once with respect to L and once with respect to K yielding: ¶ µ 1 @ @ 2 F (L; K) 1 1 1 3 1 1 ¡ 12 14 @F (L; K) = L¡ 2 K 4 =) = L K = L¡ 2 K ¡ 4 : @L 2 @L@K @K 2 8 By Young’s theorem the 1; 2 and 2; 1 elements are identical and so we obtain: L K · L ? 3 1 ¡ 12 K¡ 4 8L K 3 1 ¡ 12 K¡ 4 8L ¸ : To …nd the 1; 1 element where the ? is placed we di¤erentiate twice with respect to L so that: ¶ µ 1 ¡1 1 @ 2 F (L; K) @ 3 1 1 1 ¡ 12 14 @F (L; K) 2 4 = L K =) L K = = ¡ L¡ 2 K 4 @L 2 @L2 @L 2 4 and so we obtain: L K · L 3 1 ¡ 14 L¡ 2 K 4 3 1 ¡ 12 K¡ 4 8L K 3 1 ¡ 12 K¡ 4 8L ? ¸ : Finally to …nd the 2; 2 element where the ? is placed we di¤erentiate twice with respect to K so that: ¶ µ 7 1 12 ¡ 34 @ 2 F (L; K) @ 3 1 1 12 ¡ 34 @F (L; K) = L K L K =) = = ¡ L 2 K¡ 4 2 @K 4 @K @K 4 16 so that the Hessian is given by: · 1 ¡3 1 ¡4L 2 K 4 H (L; K) = 3 1 ¡ 12 K¡ 4 8L 4.3.2 1 ¡ 12 ¡ 34 8L 1K 7 3 ¡ 16 L 2 K¡ 4 ¸ : Concavity and Convexity In univariate calculus a function y = f (x) is concave (a mountain) if f 00 (x) < 0 and convex (a valley) if f 00 (x) > 0 where these mountains and valleys are in the two dimensional space <2 . CHAPTER 4. MULTIVARIATE CALCULUS 213 In multivariate calculus with n xi 0 s : y = f (x1 ; x2 ; : : : xn ) instead of f 00 (x) we look at the n £ n Hessian: H (x1 ; x2 ; : : : xn ) to determine if a function is concave (a mountain) or convex (a valley) in the n + 1 dimensional space <n+1 : Let us start with some easy examples. Consider the multivariate function f (x1 ; x2 ) = ¡ plotted below: ¢ 1¡ 2 x1 + x22 2 y x2 x1 : ¡ ¢ f (x1 ; x2 ) = ¡ 12 x21 + x22 which is concave (a mountain) in 3 dimensions. You can verify that the Hessian for this function is: · ¸ ¡1 0 H (x1 ; x2 ) = 0 ¡1 which is a negative de…nite matrix (since it is a diagonal matrix with negative elements along the diagonal). Note the parallel: in univariate calculus a function is concave if f 00 (x) is negative, in multivariate calculus a function is concave if its Hessian H (x) is negative de…nite. The function: f (x1 ; x2 ) = ¢ 1¡ 2 x1 + x22 2 CHAPTER 4. MULTIVARIATE CALCULUS 214 which is plotted below: y x2 x1 : ¡ ¢ f (x1 ; x2 ) = 12 x21 + x22 is convex or a valley in 3 dimensions. You can verify that the Hessian for this function is: · ¸ 1 0 H (x1 ; x2 ) = 0 1 which is a positive de…nite matrix (since it is diagonal with positive elements along the diagonal). Again note the parallel: in univariate calculus a function is convex if f 00 (x) is positive, in multivariate calculus a function is convex if its Hessian H (x) is positive de…nite. We have: De…nition 249 Concavity: The function y = f (x1 ; x2 ; : : : xn ) is concave if the Hessian: H (x1 ; x2 ; : : : xn ) is a negative de…nite matrix. De…nition 250 Convexity: The function y = f (x1 ; x2 ; : : : xn ) is convex if the Hessian: H (x1 ; x2 ; : : : xn ) is a positive de…nite matrix. Remark: As before we can distinguish between local concavity and convexity and global concavity and convexity. Thus De…nition 251 Local Concavity: The function y = f (x1 ; x2 ; : : : xn ) is locally concave at a point: x01 ; x02 ; : : : x0n if the Hessian evaluated at x01 ; x02 ; : : : x0n or ¢ ¡ H x01 ; x02 ; : : : x0n is a negative de…nite matrix. De…nition 252 Local Convexity: The function y = f (x1 ; x2 ; : : : xn ) is lo0 0 0 0 0 0 cally ¡ convex at a¢ point: x1 ; x2 ; : : : xn if the Hessian evaluated at x1 ; x2 ; : : : xn or H x01 ; x02 ; : : : x0n is a positive de…nite matrix. De…nition 253 Global Concavity: The function y = f (x1 ; x2 ; : : : xn ) is globally concave if the Hessian: H (x1 ; x2 ; : : : xn ) is a negative de…nite matrix for all x1 ; x2 ; : : : xn in the domain of f (x1 ; x2 ; : : : xn ) : CHAPTER 4. MULTIVARIATE CALCULUS 215 De…nition 254 Global Convexity: The function y = f (x1 ; x2 ; : : : xn ) is globally convex if the Hessian: H (x1 ; x2 ; : : : xn ) is a positive de…nite matrix for all x1 ; x2 ; : : : xn in the domain of f (x1 ; x2 ; : : : xn ) : Example 1: We have seen from a previous example that the function y = f (x1 ; x2 ) = 1 2 ln (x1 ) + ln (x2 ) ¡ x1 x2 3 3 has a Hessian H (x1 ; x2 ) = " ¡ 3x12 1 ¡1 ¡1 ¡ 3x22 2 # : The function is locally concave at x01 = 12 and x02 = 13 since: 3 2 µ ¶ ¡1 ¡ 11 2 1 1 3( 2 ) 5 H ; = 4 ¡1 ¡ 21 2 2 3 3( 3 ) · 4 ¸ ¡ 3 ¡1 = ¡1 ¡6 is a negative de…nite matrix as can be shown from the leading principal minors where M1 = ¡ 43 < 0 and M2 = 7 > 0; or from the eigenvalues which are ¸1 = ¡1:13 < 0 and ¸2 = ¡6:21 < 0. The function is not globally concave since at another point where x01 = x02 = 1: · 1 ¸ ¡ 3 ¡1 H (1; 1) = ¡1 ¡ 23 which is not negative de…nite since this requires that both eigenvalues be negative but: ¸1 ¸1 1 = ¡ + 2 1 = ¡ ¡ 2 1p 37 = 0:513 > 0 6 1p 37 = ¡1:514 < 0: 6 Example 2: Consider the Cobb-Douglas production function: 1 1 Q = L2 K 4 CHAPTER 4. MULTIVARIATE CALCULUS 216 plotted below: 3 Q 2 1 0 0 0 L K 5 5 1 2 Q=L K : 1 4 From the graph the function appears mountain-like or concave. To verify this let us look at the Hessian calculated in a previous example: ¸ · 1 ¡3 1 1 ¡ 12 ¡ 34 ¡4L 2 K 4 8L 1K 7 : H (L; K) = 1 3 1 ¡2 3 K ¡ 4 ¡ 16 L 2 K¡ 4 8L The …rst leading principal minor M1 is negative for all L and K since: 3 1 1 M1 = ¡ L¡ 2 K 4 < 0: 4 The second leading principal minor M2 is positive since: · 1 ¡3 1 3 ¸ 1 ¡ 12 ¡4L 2 K 4 L K¡ 4 8 M2 = det 1 ¡ 1 ¡ 3 1 7 3 2K 4 ¡ 16 L 2 K¡ 4 8L 1 3 3 ¡1 ¡ 32 L K ¡ L¡1 K ¡ 2 = 64 64 1 ¡1 ¡ 3 L K 2 > 0: = 32 It follows that H (L; K) is a negative de…nite matrix for all L and K and 1 1 hence that Q = L 2 K 4 is globally concave. You may want to attempt to prove the following results: Theorem 255 The Cobb-Douglas production function: f (L; K) = L® K ¯ with ® > 0 and ¯ > 0 is globally concave if and only if ® + ¯ < 1: Theorem 256 f (x1 ; x2 ; : : : xn ) is globally concave (convex) if and only if g (x1 ; x2 ; : : : xn ) = ¡1 £ f (x1 ; x2 ; : : : xn ) is globally convex (concave). CHAPTER 4. MULTIVARIATE CALCULUS 217 Theorem 257 If f (x1 ; x2 ; : : : xn ) is globally concave (convex) and g (x1 ; x2 ; : : : xn ) is globally concave (convex) then h (x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + g (x1 ; x2 ; : : : xn ) is globally concave (convex). 4.3.3 First and Second-Order Taylor Series We say in univariate calculus that Taylor series can be used to approximate an arbitrary function by a linear function or a quadratic. Similar results apply for multivariate functions. In particular: Theorem 258 If x is an n £ 1 vector and f (x) is a multivariate function, a …rst-order Taylor series of f (x) around the point x0 is given by: ¡ ¢T ¡ ¢ f x0 + rf x0 (x ¡ x0 ) while a second-order Taylor series around x0 is given by: ¡ ¢T ¡ ¢ 1 f x0 + rf x0 (x ¡ x0 ) + (x ¡ x0 )T H (x0 ) (x ¡ x0 ) 2 where rf (x) is the gradient and H (x) the Hessian of f (x) : Example: Given: f (x1 ; x2 ) = x51 x32 and suppose we wish to calculate a Taylor series approximation around x01 = 1and x02 = 2. We then have: f (1; 2) = (1)5 (2)3 = 8 and @f (x1 ; x2 ) @x1 @f (x1 ; x2 ) @x2 @f (1; 2) = 5 (1)4 (2)2 = 20 @x1 @f (1; 2) = 2x51 x12 =) = 2 (1)5 (2)1 = 4 @x2 = 5x41 x22 =) so that the gradient at (1; 2) is ¡ ¢ rf x0 ´ rf (1; 2) = · 20 4 ¸ and a …rst-order Taylor series around x01 = 1 and x02 = 2 would be ¸ · ¸¶ µ· ¡ ¢T ¡ ¢ £ ¤ 1 x1 f x0 + rf x0 (x ¡ x0 ) = 8 + 20 4 ¡ 2 x2 = 8 + 20 £ (x1 ¡ 1) + 4 £ (x2 ¡ 2) : CHAPTER 4. MULTIVARIATE CALCULUS 218 To calculate a second-order Taylor series we need the second derivatives: @ 2 f (x1 ; x2 ) @x21 2 @ f (x1 ; x2 ) @x1 @x2 @ 2 f (x1 ; x2 ) @x22 @ 2 f (1; 2) = 20 (1)3 (2)2 = 80 @x21 @ 2 f (1; 2) = 15x41 x22 =) = 15 (1)4 (2)2 = 60 @x1 @x2 @ 2 f (1; 2) = 6x51 x2 =) = 6 (1)5 (2)1 = 12 @x21 = 20x31 x22 =) and so that the Hessian at (1; 2) is: H (1; 2) = · 80 60 60 12 ¸ : Thus the second-order Taylor series is: ¡ ¢T ¡ ¢ 1 f x0 + rf x0 (x ¡ x0 ) + (x ¡ x0 )T H (x0 ) (x ¡ x0 ) 2 = 8 + 20 (x1 ¡ 1) + 4 (x2 ¡ 2) ¸ · ¸· ¤ 80 60 1£ x1 ¡ 1 x1 ¡ 1 x2 ¡ 2 + x2 ¡ 2 60 12 2 = 8 + 20 £ (x1 ¡ 1) + 4 £ (x2 ¡ 2) +40 (x1 ¡ 1)2 + 6 (x2 ¡ 2)2 + 60 (x1 ¡ 1) (x2 ¡ 2) : 4.4 4.4.1 Unconstrained Optimization First-Order Conditions The …rst-order conditions for a maximum or minimum of a function of n variables: y = f (x1 ; x2 ; : : : xn ) are: Theorem 259 First-Order Conditions: If x¤1 ; x¤2 ; : : : x¤n maximizes or minimizes the function y = f (x1 ; x2 ; : : : xn ) then: @f (x¤1 ; x¤2 ; : : : x¤n ) @f (x¤1 ; x¤2 ; : : : x¤n ) @f (x¤1 ; x¤2 ; : : : x¤n ) = 0; = 0; ¢ ¢ ¢ ; = 0: @x1 @x2 @xn Proof. (by contradiction): Suppose x¤1 ; x¤2 ; : : : x¤n maximizes (minimizes) y ¤ ¤ @f (x¤ 1 ;x2 ;:::xn ) > 0: It follows then at x¤1 ; x¤2 ; : : : x¤n and suppose it were the case that @xi that y is a locally increasing function of xi so that increasing (decreasing) xi and keeping all other variables …xed would increase (decrease) y: This however CHAPTER 4. MULTIVARIATE CALCULUS 219 contradicts x¤1 ; x¤2 ; : : : x¤n being a maximum (minimum) and so ¤ ¤ @f (x¤ 1 ;x2 ;:::xn ) @xi ¤ ¤ @f (x¤ 1 ;x2 ;:::xn ) @xi >0 < 0 then y is a locally decreasing funcis not possible. Similarly if tion of xi and so if we decreased (increased) xi keeping all other variables …xed then y would increase (decrease). Again this contradicts x¤1 ; x¤2 ; : : : x¤n being a ¤ ¤ @f (x¤ 1 ;x2 ;:::xn ) < 0 is not possible. It follows then maximum (minimum) and so @xi that: @f (x¤1 ; x¤2 ; : : : x¤n ) = 0: @xi We can also express the …rst-order conditions more compactly using the (x) gradient rf (x) ´ @f@x evaluated at x¤ so that: Theorem 260 If the n £ 1 vector x¤ maximizes or minimizes y = f (x) then: rf (x¤ ) ´ @f (x¤ ) = 0: @x where 0 is an n £ 1 vector of zeros. Remark: The …rst-order conditions for a maximum or minimum involve n equations in n unknowns x¤1 ; x¤2 ; : : : x¤n . If the problem is “nice” then it is sometimes possible to solve these n equations for the n unknown: x¤1 ; x¤2 ; : : : x¤n : Even when we cannot explicitly solve these equations we can often learn a lot about the nature of the solution by examining the …rst-order conditions. Although …nding the …rst-order conditions is generally straightforward, there are a few pitfalls that students can avoid by using the following recipe: Deriving the First-Order Conditions 1. Calculate the n …rst-order partial derivatives 2. Put ¤ 0 @f (x1 ;x2 ;:::xn ) @xi for i = 1; 2; : : : n: s on all the xi 0 s in 1 and set each partial derivative equal to zero. 3. If possible solve for x¤1 ; x¤2 ; : : : x¤n or if not examine the …rst-order conditions for anything you can learn about the optimal values. Example 1: Consider the function: f (x1 ; x2 ) = 3x21 ¡ 6x1 x2 + 5x22 ¡ 4x1 ¡ 2x2 + 8: 1. Calculating the …rst derivatives we …nd: @f (x1 ; x2 ) @x1 @f (x1 ; x2 ) @x2 = 6x1 ¡ 6x2 ¡ 4 = ¡6x1 + 10x2 ¡ 2: CHAPTER 4. MULTIVARIATE CALCULUS 220 2. Putting ¤ 0 s on the xi 0 s in 1: and setting these derivatives equal to 0 results in: @f (x¤1 ; x¤2 ) @x1 @f (x¤1 ; x¤2 ) @x2 = 6x¤1 ¡ 6x¤2 ¡ 4 = 0 = ¡6x¤1 + 10x¤2 ¡ 2 = 0: 3. We can solve the …rst-order conditions since in matrix notation we obtain: · ¸· ¤ ¸ · ¸ 6 ¡6 x1 4 = : ¡6 10 2 x¤2 So that using Cramer’s rule we …nd that: · ¸ · 4 ¡6 6 det det 2 10 ¡6 13 · ¸ = ; x¤2 = · x¤1 = 6 6 ¡6 6 det det ¡6 10 ¡6 ¸ 4 2 3 ¸= : 2 ¡6 10 Example 2: Consider the function: y = f (x1 ; x2 ) = 1 2 ln (x1 ) + ln (x2 ) ¡ x1 ¡ x2 : 3 3 Following the recipe we have: 1. The …rst derivatives are: @f (x1 ; x2 ) @x1 @f (x1 ; x2 ) @x2 2. Putting ¤ 0 1 ¡ 1; 3x1 2 ¡ 1: 3x2 = = s on the xi 0 s in 1: and setting these derivatives equal to 0: @f (x¤1 ; x¤2 ) @x1 @f (x¤1 ; x¤2 ) @x2 3. Solving we …nd that: x¤1 = 1 3 = = 1 ¡ 1 = 0; 3x¤1 2 ¡ 1 = 0: 3x¤2 and x¤2 = 23 : Example 3: Consider the function: 1=4 1=2 f (x1 ; x2 ) = x1 x2 ¡ 3x1 ¡ 2x2 ; where x1 > 0 and x2 > 0: Following the recipe we have: CHAPTER 4. MULTIVARIATE CALCULUS 1. The …rst derivatives are: @f (x1 ; x2 ) @x1 @f (x1 ; x2 ) @x2 2. Putting ¤ 0 = = 221 1 ¡3=4 1=2 x x2 ¡ 3 4 1 1 1=4 ¡1=2 x x ¡ 2: 2 1 2 s on the xi 0 s in 1 and setting these derivatives equal to 0: @f (x¤1 ; x¤2 ) @x1 @f (x¤1 ; x¤2 ) @x2 = = 1 ¤ ¡3=4 ¤ 1=2 (x ) (x2 ) ¡ 3 = 0 4 1 1 ¤ 1=4 ¤ ¡1=2 (x ) (x2 ) ¡2 =0 2 1 3. We now attempt to solve the equations in 2 using the ln ( ) function to convert them into two linear equations as: (x¤1 ) ¡3=4 (x¤1 ) 1=4 (x¤2 )1=2 (x¤2 )¡1=2 3 1 = 12 =) ¡ ln (x¤1 ) + ln (x¤2 ) = ln (12) 4 2 1 1 = 4 =) ln (x¤1 ) ¡ ln (x¤2 ) = ln (4) 4 2 and hence: 1 3 ¡ y1¤ + y2¤ = ln (12) 4 2 1 ¤ 1 ¤ y ¡ y = ln (4) 4 1 2 2 where y1¤ = ln (x¤1 ) and y2¤ = ln (x¤2 ) : Writing this in matrix notation we obtain: ¸· ¤ ¸ · ¸ · 3 1 y1 ln (12) ¡4 2 = : 1 ¡ 12 ln (4) y2¤ 4 Using Cramer’s rule we …nd that: · ¸ 1 ln (12) 2 det ln (4) ¡ 12 · 3 ¸ = ¡2 ln (12) ¡ 2 ln (4) ; ln (x¤1 ) = y1¤ = 1 ¡4 2 det 1 ¡ 12 ¸ · 34 ¡ 4 ln (12) det 1 ln (4) · 43 ¸ = ¡3 ln (4) ¡ ln (12) : ln (x¤2 ) = y2¤ = 1 ¡4 2 det 1 ¡ 12 4 Thus ¤ 1 2304 1 : = 768 x¤1 = ey1 = e¡2 ln(12)¡2 ln(4) = x¤2 = ey2 = e¡3 ln(4)¡ln(12) ¤ CHAPTER 4. MULTIVARIATE CALCULUS 222 Example 4: Suppose a perfectly competitive …rm has a technology: Q = F (L; K) which is globally concave. Pro…ts expressed as a function of L and K are given by: ¼ (L; K) = P F (L; K) ¡ W L ¡ RK: Following the recipe we have: 1. The …rst derivatives are: @¼ (L; K) @L @¼ (L; K) @K @F (L; K) ¡W @L @F (L; K) = P ¡ R: @K = P 2. Putting ¤ 0 s on L and K in 1 and setting these derivatives equal to 0 results in: @¼ (L¤ ; K ¤ ) @L @¼ (L¤ ; K ¤ ) @K @F (L¤ ; K ¤ ) ¡W =0 @L @F (L¤ ; K ¤ ) = P ¡ R = 0: @K = P 3. Given the level of generality there is no hope of explicitly solving for L¤ and K ¤ here. We can nevertheless learn something about how a competitive …rm chooses L¤ and K since: W @F (L¤ ; K ¤ ) ¡ W = 0 =) MPL (L¤ ; K ¤ ) = ´w @L P @F (L¤ ; K ¤ ) R ¡ R = 0 =) MPK (L¤ ; K ¤ ) = ´r P @K P P where w and r are the real wage rate and real rental cost of capital. Thus the competitive …rm chooses L¤ and K ¤ to equate the marginal products of labour and capital with the real wage w and the real rental cost of capital. Example 5: Consider pro…t maximization in the long-run with a Cobb-Douglas production function: 1 1 Q = F (L; K) = L 2 K 4 with P = 8; W = 5 and R = 4. The pro…t function is then: ¼ (L; K) = P F (L; K) ¡ W L ¡ RK or: 1 1 ¼ (L; K) = 8L 2 K 4 ¡ 5L ¡ 4K: Following the recipe we have: CHAPTER 4. MULTIVARIATE CALCULUS 223 1. The …rst derivatives are: @¼ (L; K) @L @¼ (L; K) @K 2. Putting in: ¤ 0 1 1 = 4L¡ 2 K 4 ¡ 5 1 3 = 2L 2 K ¡ 4 ¡ 4: s on L and K and setting these derivatives equal to 0 results @¼ (L¤ ; K ¤ ) @L @¼ (L¤ ; K ¤ ) @K 1 1 1 1 = 4L¤¡ 2 K ¤ 4 ¡ 5 = 0 =) L¤¡ 2 K ¤ 4 = 1 3 1 5 4 3 = 2L¤ 2 K ¤¡ 4 ¡ 4 = 0 =) L¤ 2 K ¤¡ 4 = 2: 3. We now attempt to solve using the ln ( ) function to convert these equations into two linear equations as: µ ¶ 1 5 1 5 1 1 =) ¡ ln (L¤ ) + ln (K ¤ ) = ln L¤¡ 2 K ¤ 4 = 4 2 4 4 1 3 1 3 L¤ 2 K ¤¡ 4 = 2 =) ln (L¤ ) ¡ ln (K ¤ ) = ln (2) 2 4 or in matrix notation as · ¡ 12 1 2 1 4 ¡ 34 ¸· ¤ l k¤ ¸ 2 =4 ln ¡5¢ 3 4 ln (2) 5 where l¤ = ln (L¤ ) and k¤ = ln (K ¤ ) : Solving these two equations by matrix inversion (or by using Cramer’s rule) we obtain: ¡ ¢ 3 2 · ¤ ¸ · 1 ¸¡1 · ¡ 5 ¢ ¸ ¡3 ln 54 ¡ ln (2) 1 l ¡2 ln 4 4 5: =4 = 1 ¡5¢ k¤ ¡ 34 ln (2) 2 ¡2 ln 4 ¡ 2 ln (2) Thus 4.4.2 32 125 4 = : 25 L¤ = e¡3 ln( 4 )¡ln(2) = K¤ = e¡2 ln( 4 )¡2 ln(2) 5 5 Second-Order Conditions A solution x¤1 ; x¤2 ; : : : x¤n to the …rst-order conditions can be either a maximum or a minimum. Clearly we want to be able to know if x¤1 ; x¤2 ; : : : x¤n is a maximum or a minimum. For example if we are interested in pro…t maximization we do not want to be at a point which minimizes pro…ts. CHAPTER 4. MULTIVARIATE CALCULUS 224 As with univariate calculus, the second-order conditions rely on the fact that a maximum occurs at the top of a mountain (concavity) while a minimum occurs at the bottom of a valley (convexity). We therefore use the matrix of second derivatives or the Hessian: H (x1 ; x2 ; : : : xn ) to determine if x¤1 ; x¤2 ; : : : x¤n is the a maximum or a minimum. As before the weaker condition of local concavity (convexity) yields the weaker result of a local maximum (minimum) while the stronger condition of global concavity (convexity) yields the stronger result of a global maximum (minimum). We have: Theorem 261 Local Maximum: If x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and f (x1 ; x2 ; : : : xn ) is locally concave at x¤1 ; x¤2 ; : : : x¤n so that: H (x¤1 ; x¤2 ; : : : x¤n ) is negative de…nite, then x¤1 ; x¤2 ; : : : x¤n is a local maximum of f (x1 ; x2 ; : : : xn ) : Theorem 262 Local Minimum: If x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and f (x1 ; x2 ; : : : xn ) is locally convex at x¤1 ; x¤2 ; : : : x¤n so that: H (x¤1 ; x¤2 ; : : : x¤n ) is positive de…nite, then x¤1 ; x¤2 ; : : : x¤n is a local minimum of f (x1 ; x2 ; : : : xn ) : Theorem 263 If x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and f (x1 ; x2 ; : : : xn ) is globally concave so that H (x1 ; x2 ; : : : xn ) is a negative de…nite matrix for all x1 ; x2 ; : : : xn ; then x¤1 ; x¤2 ; : : : x¤n is the unique global maximum of f (x1 ; x2 ; : : : xn ) : Theorem 264 If x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and f (x1 ; x2 ; : : : xn ) is globally convex so that H (x1 ; x2 ; : : : xn ) is a positive de…nite matrix for all x1 ; x2 ; : : : xn ; then x¤1 ; x¤2 ; : : : x¤n is the unique global minimum of f (x1 ; x2 ; : : : xn ) : Example 1 (continued): For the function: f (x1 ; x2 ) = 3x21 ¡ 6x1 x2 + 5x22 ¡ 4x1 ¡ 2x2 + 8: 3 ¤ we showed that x¤1 = 13 6 x2 = 2 is a solution to the …rst-order conditions. To determine if this is a maximum or a minimum we need: @ 2 f (x1 ; x2 ) @x21 2 @ f (x1 ; x2 ) @x2 @x1 @ 2 f (x1 ; x2 ) @x22 = = = @ (6x1 ¡ 6x2 ¡ 4) = 6 @x1 @ (6x1 ¡ 6x2 ¡ 4) = ¡6 @x2 @ (¡6x1 + 10x2 ¡ 2) = 10 @x2 so that the Hessian of f (x1 ; x2 ) is: H (x1 ; x2 ) = · 6 ¡6 ¡6 10 ¸ : Note for this particular problem the Hessian does not depend on x1 and x2 : CHAPTER 4. MULTIVARIATE CALCULUS 225 Now from the leading principal minors we have: M1 M1 = det [6] = 6 > 0 · ¸ 6 ¡6 = det = 24 > 0: ¡6 10 It follows then that H (x1 ; x2 ) is positive de…nite for all x1 ; x2 and hence f (x1 ; x2 ) 3 ¤ is globally convex. Therefore x¤1 = 13 6 x2 = 2 is a global minimum. Example 2 (continued): For the function: y = f (x1 ; x2 ) = 1 2 ln (x1 ) + ln (x2 ) ¡ x1 ¡ x2 3 3 1 3 we showed that the solution to the …rst-order conditions is: x¤1 = The Hessian is given by: " # ¡ 3x12 0 1 H (x1 ; x2 ) = : 0 ¡ 3x22 and x¤2 = 23 : 2 At x¤1 = 1 3 and x¤2 = 2 3 we have: 2 µ ¶ ¡ 11 2 1 2 3( 3 ) 4 ; H = 0 ¡ 3 3 0 2 2 3( 23 ) 3 5= · ¡3 0 0 ¡ 32 ¸ : which is a negative de…nite matrix (since M1 = ¡3 < 0 and M2 = 92 > 0 ) so that it follows that x¤1 = 13 and x¤2 = 23 is a local maximum. We can in fact make the stronger statement that x¤1 = 13 and x¤2 = 23 is a global maximum since the Hessian is a negative de…nite matrix for all x1 ; x2 since it is a diagonal matrix with negative elements along the diagonal. Thus: x¤1 = 13 and x¤2 = 23 is a global maximum. Example 3 (continued): For the function: 1=4 1=2 f (x1 ; x2 ) = x1 x2 ¡ 3x1 ¡ 2x2 ; we showed that the solution to the …rst-order conditions is x¤1 = We have: 1 ¤ 2304 ; x2 = 1 768 : 3 ¡7=4 1=2 @ 2 f (x1 ; x2 ) 1 1=4 ¡3=2 @ 2 f (x1 ; x2 ) 1 ¡3=4 ¡3=2 @ 2 f (x1 ; x2 ) = ¡ x1 x2 ; = ¡ x1 x2 ; = x1 x2 2 2 @x1 16 @x2 4 @x1 @x2 8 so that the Hessian is given by: " 3 ¡7=4 1=2 ¡ 16 x1 x2 H (x1 ; x2 ) = 1 ¡3=4 ¡3=2 x x2 8 1 1 ¡3=4 ¡3=2 x2 8 x1 1=4 ¡3=2 ¡ 14 x1 x2 # : CHAPTER 4. MULTIVARIATE CALCULUS Substituting x¤1 = H µ 1 1 ; 2304 768 ¶ = = 1 2304 " · and x¤2 = into H (x1 ; x2 ) we …nd that: ¢¡7=4 ¡ 1 ¢1=2 1 2304 ¢¡3=4 ¡ 1 768 ¢¡1=2 1 2304 768 3 ¡ 16 ¡ 1 8 1 768 226 ¡ ¡5184 1152 1152 ¡58982 ¸ : ¢¡3=4 ¡ 1 ¢¡1=2 1 2304 ¡ 1 ¢1=4 ¡ 768 ¢ 1 ¡5=2 ¡ 14 2304 768 1 8 ¡ # This matrix is negative de…nite from the leading principal minors since: M1 M2 = ¡5184 < 0 · ¸ ¡5184 1152 = det = 304435584 > 0: 1152 ¡58982 1 1 It follows then that x¤1 = 2304 , x¤2 = 768 is a local maximum. 1 1 ¤ and x¤2 = 768 is a global maximum We can in fact prove more, that x1 = 2304 by showing that f (x1 ; x2 ) is globally concave. To do this we need to show that H (x1 ; x2 ) is negative de…nite for all x1 ; x2 : Using leading principal minors we have: 3 ¡7=4 1=2 x x2 < 0 16" 1 # 3 ¡7=4 1=2 1 ¡3=4 ¡3=2 ¡ 16 x1 x2 x x 2 8 1 = det 1 ¡3=4 ¡3=2 1 1=4 ¡5=2 ¡ x x x x 1 1 2 2 8 4 1 ¡6=4 ¡4=2 3 ¡6=4 ¡4=2 x ¡ x1 x2 x2 = 64 1 64 2 ¡6=4 ¡4=2 x = x2 >0 64 1 M1 = ¡ M2 so that H (x1 ; x2 ) is a negative de…nite matrix for all x1 and x2 : Thus f (x1 ; x2 ) 1 1 ; x¤2 = 768 is globally concave and x¤1 = 2304 is the unique global maximum. Example 4 (continued): Given the problem of maximizing pro…ts: ¼ (L; K) = P F (L; K) ¡ W L ¡ RK where the production function: Q = F (L; K) is globally concave so that its Hessian: " 2 # 2 HF (L; K) = @ F (L;K) @L2 @ 2 F (L;K) @L@K @ F (L;K) @L@K @ 2 F (L;K) @K 2 is negative de…nite for all L and K: The Hessian of the pro…t function is then: " 2 # " # 2 2 @ ¼(L;K) @ 2 ¼(L;K) (L;K) F (L;K) P @ F@L P @ @L@K 2 2 @L @L@K = H¼ (L; K) = 2 2 @ 2 ¼(L;K) @ 2 ¼(L;K) F (L;K) (L;K) P @ @L@K P @ F@K 2 @L@K @K 2 " 2 # 2 = P @ F (L;K) @L2 @ 2 F (L;K) @L@K @ F (L;K) @L@K @ 2 F (L;K) @K 2 = P HF (L; K) : CHAPTER 4. MULTIVARIATE CALCULUS 227 Since HF (L; K) is negative de…nite for all (L; K) (since F (L; K) is concave), and since P > 0; it follows that H¼ (L; K) is also negative de…nite for all (L; K). Thus ¼ (L; K) is globally concave and hence the L¤ ; K ¤ which solves the …rstorder conditions is the unique global maximum. Example 5: Consider the long-run pro…t maximization problem with: 1 1 ¼ (L; K) = 8L 2 K 4 ¡ 5L ¡ 4K: 32 4 ; K ¤ = 25 : We showed that the solution to the …rst-order conditions is: L¤ = 125 We would like to show that this is in fact a global pro…t maximum. The Hessian of ¼ (L; K) is: · 3 1 1 3 ¸ ¡2L¡ 2 K 4 L¡ 2 K ¡ 4 H (L; K) = : 1 3 1 7 L¡ 2 K ¡ 4 ¡ 32 L 2 K ¡ 4 Using leading principal minors we have: M1 M2 3 1 = ¡2L¡ 2 K 4 < 0 3 1 3 3 = 3L¡1 K 2 ¡ L¡ 2 K ¡ 4 = 2L¡1 K 2 > 0: Thus H (L; K) is negative de…nite for all L and K so that ¼ (L; K) is globally 32 4 and K ¤ = 25 is a global maximum. concave and hence: L¤ = 125 4.5 4.5.1 Quasi-Concavity and Quasi-Convexity Ordinal and Cardinal Properties Just as with univariate functions, multivariate functions have both ordinal and cardinal properties de…ned in exactly the same manner: De…nition 265 Ordinal Property: An ordinal property of a function f (x1 ; x2 ; : : : xn ) is one which remains unchanged when any monotonic transformation g (x) is applied; that is both f (x1 ; x2 ; : : : xn ) and g (f (x1 ; x2 ; : : : xn )) share the property for any g (x) with: g0 (x) > 0: De…nition 266 Cardinal Property: A cardinal property of a function f (x1 ; x2 ; : : : xn ) is one which does change a monotonic transformation is applied. Just as with univariate functions, the global maximum or minimum is an ordinal property while concavity or convexity is a cardinal property. Theorem 267 A Global Maximum or Minimum: x¤1 ; x¤2 ; : : : x¤n is an Ordinal Property; that is x¤1 ; x¤2 ; : : : x¤n is a global maximum or minimum of f (x1 ; x2 ; : : : xn ) if and only if it is also a global maximum or minimum of g (f (x1 ; x2 ; : : : xn )) with g 0 (x) > 0: CHAPTER 4. MULTIVARIATE CALCULUS 228 Theorem 268 Concavity and Convexity are Cardinal Properties: If f (x1 ; x2 ; : : : xn ) is globally concave or convex it does not follow that g (f (x1 ; x2 ; : : : xn )) (with g 0 (x) > 0) is globally concave or convex. Again this leads us to de…ne a weaker notion of concavity or convexity which is an ordinal property: De…nition 269 Quasi-Concavity: A function f (x1 ; x2 ; : : : xn ) is quasi-concave if and only if it can be written as: f (x1 ; x2 ; : : : xn ) = g (h (x1 ; x2 ; : : : xn )) with g 0 (x) > 0 and where h (x1 ; x2 ; : : : xn ) is globally concave. De…nition 270 Quasi-Convexity: A function f (x1 ; x2 ; : : : xn ) is quasi-convex if and only if it can be written as: f (x1 ; x2 ; : : : xn ) = g (h (x1 ; x2 ; : : : xn )) with g 0 (w) > 0 and h (x1 ; x2 ; : : : xn ) is globally convex. We have: Theorem 271 Quasi-Concavity and Quasi-Convexity are Ordinal Properties. Just as with univariate functions, one can show that a given function is quasi-concave (quasi-convex) by …nding a monotonic transformation g (x) which makes the function concave (convex). Thus: Theorem 272 A function f (x1 ; x2 ; : : : xn ) is quasi-concave (quasi-convex) if and only if there exists a monotonic transformation g (x) such that h (x1 ; x2 ; : : : xn ) = g (f (x1 ; x2 ; : : : xn )) is globally concave (globally convex). Example: Consider the function: f (x1 ; x2 ) = x21 x42 for x1 > 0 and x2 > 0: You can verify that the Hessian Hf (x1 ; x2 ) of f (x1 ; x2 ) is given by: · ¸ 2x42 8x1 x32 Hf (x1 ; x2 ) = : 8x1 x32 12x21 x22 CHAPTER 4. MULTIVARIATE CALCULUS 229 The function x21 x42 is not concave since the diagonal elements of Hf (x1 ; x2 ) are both positive. The function x21 x42 is also not convex since H (x1 ; x2 ) is not positive de…nite since: · ¸ 2x42 8x1 x32 M2 = det [Hf (x1 ; x2 )] = det 8x1 x32 12x21 x22 = ¡40x62 x21 < 0: We can however show that x21 x42 is quasi-concave. We will do this two di¤erent ways. First we show that f (x1 ; x2 ) is a monotonic function of a concave function since: f (x1 ; x2 ) = x21 x42 = e2 ln(x1 )+4 ln(x2 ) so that the monotonic transformation is g (x) = ex (with g 0 (x) = ex > 0 ) and the concave function is: h (x1 ; x2 ) = 2 ln (x1 ) + 4 ln (x2 ) since the Hessian of h (x1 ; x2 ) is: Hh (x1 ; x2 ) = " ¡ x22 1 0 0 ¡ x42 2 # which is negative de…nite for all (x1 ; x2 ) (since it is a diagonal matrix with negative diagonal elements). We conclude then that x21 x42 is quasi-concave. Now we show that f (x1 ; x2 ) is quasi-concave by …nding a monotonic transformation g (x) which transforms f (x1 ; x2 ) into a concave function. To this end let: g (x) = ln (x) with g 0 (x) = 1 >0 x so that: ¢ ¡ h (x1 ; x2 ) = g (f (x1 ; x2 )) = ln x21 x42 = 2 ln (x1 ) + 4 ln (x2 ) : We have already shown that h (x1 ; x2 ) is globally concave and so the quasiconcavity of x21 x42 follows. 4.5.2 Su¢cient Conditions for a Global Maximum or Minimum To insure that a solution to the …rst-order conditions x¤1 ; x¤2 ; : : : x¤n is a global maximum (minimum) we do not necessarily need concavity (convexity), we only need the weaker conditions of quasi-concavity (quasi-convexity). In particular: Theorem 273 Suppose that x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and f (x1 ; x2 ; : : : xn ) is quasi-concave, then x¤1 ; x¤2 ; : : : x¤n is the unique global maximum of f (x1 ; x2 ; : : : xn ) : CHAPTER 4. MULTIVARIATE CALCULUS 230 Theorem 274 Suppose that x¤1 ; x¤2 ; : : : x¤n satis…es the …rst-order conditions and f (x1 ; x2 ; : : : xn ) is quasi-convex, then x¤1 ; x¤2 ; : : : x¤n is the unique global minimum of f (x1 ; x2 ; : : : xn ) : Example: Consider a scaled version of the bivariate standard normal distribution: 2 2 f (x1 ; x2 ) = e¡ 2 (x1 +x2 ) : 1 To …nd the …rst-order conditions we …rst calculate: @f (x1 ; x2 ) @x1 @f (x1 ; x2 ) @x2 Now we put ¤ 0 2 2 2 2 = ¡x1 £ e¡ 2 (x1 +x2 ) ; 1 = ¡x2 £ e¡ 2 (x1 +x2 ) : 1 s on the xi 0 s and solve for x¤1 and x¤2 as: @f (x¤1 ; x¤2 ) @x1 @f (x¤1 ; x¤2 ) @x2 ¤2 1 ¡ 12 (x¤2 1 +x2 ) =) x¤ = 0 e 1 2¼ ¤2 1 ¡ 12 (x¤2 1 +x2 ) =) x¤ = 0: e = 0 = ¡x¤2 £ 2 2¼ = 0 = ¡x¤1 £ We would like to show that x¤1 = 0; x¤2 = 0 is a global maximum. The Hessian of f (x1 ; x2 ) is: ¸ · 2 x1 ¡ 1 x1 x2 ¡ 12 (x21 +x21 ) H (x1 ; x2 ) = e x1 x2 x22 ¡ 1 and for x¤1 = 0; x¤2 = 0 we have: H (0; 0) = · ¡1 0 0 ¡1 ¸ which is negative de…nite and so we conclude that x¤1 = 0; x¤2 = 0 is a local maximum. We cannot use concavity to show that x¤1 = 0; x¤2 = 0 is a global maximum since at x1 = 2; x2 = 2 we have: · ¸ 3 4 H (2; 2) = e¡4 4 3 which is not negative de…nite (since it has positive elements on the diagonal). It follows that f (x1 ; x2 ) is not concave. We can show that f (x1 ; x2 ) is quasi-concave. Note that: 2 2 f (x1 ; x2 ) = e¡ 2 (x1 +x2 ) = g (h (x1 ; x2 )) 1 CHAPTER 4. MULTIVARIATE CALCULUS 231 with monotonic function: g (x) = ex and concave function: h (x1 ; x2 ) = ¡ since its Hessian is given by: · ¢ 1¡ 2 x1 + x22 : 2 ¡1 0 0 ¡1 ¸ which is a diagonal matrix with negative elements along the diagonal and hence is negative de…nite for all x1 ; x2 : It follows then that f (x1 ; x2 ) is quasi-concave and hence that x¤1 = 0; x¤2 = 0 is a global maximum for both f (x1 ; x2 ) and h (x1 ; x2 ) : 4.5.3 Indi¤erence Curves and Quasi-Concavity Suppose a household has a utility function: U (Q1 ; Q2 ) where the marginal utility of good 1 and 2 are: @U (Q1 ; Q2 ) @Q1 @U (Q1 ; Q2 ) @Q2 ´ U1 > 0; ´ U2 > 0 and the second derivatives are: U11 = @ 2 U (Q1 ; Q2 ) @ 2 U (Q1 ; Q2 ) @ 2 U (Q1 ; Q2 ) ; U22 = ; U12 = : 2 2 @Q1 @Q2 @Q1 @Q2 We de…ne an indi¤erence curve from the utility function as follows: De…nition 275 Indi¤erence Curve: An indi¤erence curve corresponding to utility level c, written as Q2 = f (Q1 ) ; is all combinations of Q1 ; Q2 which yield c units of utility or: U (Q1 ; f (Q1 )) = c: We de…ne the slope of the indi¤erence curve as: De…nition 276 Marginal Rate of Substitution: The marginal rate of substitution is: M RS (Q1 ; Q2 ) = f 0 (Q1 ) : De…nition 277 We say that the indi¤erence curve has a diminishing marginal rate if substitution if f 00 (Q1 ) > 0: CHAPTER 4. MULTIVARIATE CALCULUS 232 Example: Given the Cobb-Douglas utility function: U (Q1 ; Q2 ) = Q21 Q42 then an indi¤erence curve which yields 5 units of utility is de…ned as: 1 ¡ 12 U (Q1 ; Q2 ) = Q21 Q42 = 5 =) Q2 = 5 4 Q1 1 ¡1 so that the indi¤erence curve is: Q2 = 5 4 Q1 2 : This indi¤erence curve is plotted below: Q2 Q1 : Indi¤erence Curve Note that this indi¤erence curve has the correct shape: it is downward sloping and convex or bent towards origin. It therefore exhibits a diminishing marginal rate of substitution. You can verify that this is true for all indi¤erence curves where the indi¤erence curve corresponding to c units of utility is: 1 ¡1 Q2 = f (Q1 ) = c 4 Q1 2 : We can show that all indi¤erence curves slope downwards and show that the slope or marginal rate of substitution is equal to the negative of the ratio of the marginal utilities: Theorem 278 Given a utility function U (Q1 ; Q2 ) the marginal rate of substitution is: @U(Q1 ;Q2 ) @Q1 1 ;Q2 ) @Q2 f 0 (Q1 ) = ¡ @U(Q =¡ Proof. An indi¤erence curve is de…ned as: U (Q1 ; f (Q1 )) = c: U1 < 0: U2 CHAPTER 4. MULTIVARIATE CALCULUS 233 Let U (Q1 ; Q2 ) be the outside function and let g1 (Q1 ) = Q1 and g2 (Q1 ) = f (Q1 ) be the two inside functions. Di¤erentiating both sides of with respect to Q1 and using the chain rule we …nd that: @U (g1 (Q1 ) ; g2 (Q1 )) 0 @U (g1 (Q1 ) ; g2 (Q1 )) 0 g1 (Q1 ) + g2 (Q1 ) = 0: @Q1 @Q2 Since g10 (Q1 ) = 1 and g20 (Q1 ) = f 0 (Q1 ) we have: @U (Q1 ; f (Q1 )) @U (Q1 ; f (Q1 )) 0 + f (Q1 ) = 0 @Q1 @Q2 and since Q2 = f (Q1 ) we can write this as: @U (Q1 ; Q2 ) @U (Q1 ; Q2 ) 0 + f (Q1 ) = 0 @Q1 @Q2 so that solving for f 0 (Q1 ) and using U1 > 0 and U2 > 0 the result follows. Now suppose we take a monotonic transformation of U (Q1 ; Q2 ) as: ~ (Q1 ; Q2 ) = g (U (Q1 ; Q2 )) U ~ (Q1 ; Q2 )? where g0 (x) > 0. We might then ask what kind of utility function is U ~ (Q1 ; Q2 ) The quite surprising answer is that all practical purposes U (Q1 ; Q2 ) and U are the same utility function! Actual economic behavior depends on the indif~ (Q1 ; Q2 ) are ference curves and the indi¤erence curves of U (Q1 ; Q2 ) and U identical; that is: ~ (Q1 ; Q2 ) = g (c) : U (Q1 ; Q2 ) = c , U or all combinations Q1 ; Q2 which yield c units of utility given U (Q1 ; Q2 ) also ~ (Q1 ; Q2 ) : The only di¤erence is that with one yield g (c) units of utility given U utility function the indi¤erence curve has the utility number c associated with it while the other has the utility number g (c) associated with it. Actual economic behavior though does not depend on what utility numbers we attach to each ~ (Q1 ; Q2 ) both represent the same indi¤erence curve and so U (Q1 ; Q2 ) and U preferences and hence economic behavior. We thus have: Theorem 279 An indi¤erence curve of a utility function U (Q1 ; Q2 ) is an ordinal property of U (Q1 ; Q2 ) : Example: Given the Cobb-Douglas utility function: U (Q1 ; Q2 ) = Q21 Q42 if we transform it with g (x) = ln (x) (with g0 (x) = x1 > 0) then ¢ ¡ ~ (Q1 ; Q2 ) = ln Q21 Q42 = 2 ln (Q1 ) + 4 ln (Q2 ) U CHAPTER 4. MULTIVARIATE CALCULUS 234 is an equivalent utility function and has the same indi¤erence curves. Thus if ~ (Q1 ; Q2 ) which yields ln (5) units of we calculate the indi¤erence curve for U utility we obtain: ~ (Q1 ; Q2 ) U = 2 ln (Q1 ) + 4 ln (Q2 ) = ln (5) 1 ¡1 =) Q21 Q42 = 5 =) Q2 = 5 4 Q1 2 : which is the identical indi¤erence curve for 5 units of utility for U (Q1 ; Q2 ) = Q21 Q42 that we calculated above. In fact all of the utility functions below lead to the same indi¤erence curves: 1 2 1 5 2 5 Q21 Q42 ; Q13 Q23 ; eQ1 Q2 ; 1 2 ln (Q1 ) + ln (Q2 ) ; 2 ln (Q1 ) + 4 ln (Q2 ) : 3 3 The question now is under what circumstances do indi¤erence curves exhibit a diminishing marginal rate of substitution? A good …rst guess would be the concavity of the utility function. Although this is su¢cient it cannot be necessary. For example we have seen that U (Q1 ; Q2 ) = Q21 Q42 exhibits a diminishing marginal rate of substitution even though U (Q1 ; Q2 ) = Q21 Q42 is not concave since concavity requires that the diagonal elements of the Hessian be negative while: U11 = 2Q22 > 0; U22 = 12Q21 Q22 > 0: A necessary and su¢cient condition is in fact quasi-concavity. We have: Theorem 280 The utility function U (Q1 ; Q2 ) exhibits a diminishing marginal rate of substitution if and only if it is quasi-concave. Example: Despite not being concave, the utility function U (Q1 ; Q2 ) = Q21 Q42 is quasi-concave since: ¢ ¡ ~ (Q1 ; Q2 ) = ln Q2 Q4 = 2 ln (Q1 ) + 4 ln (Q4 ) U 1 2 ~ (Q1 ; Q2 ) is globally concave. and U We can obtain the following calculus test for the quasi-concavity of the utility function: Theorem 281 A utility function U (Q1 ; Q2 ) with U1 > 0 and U2 > 0 is quasiconcave if and only if det [H] > 0 where 2 3 0 U1 U2 det [H] = det 4 U1 U11 U12 5 U2 U12 U22 2 = ¡U22 U11 + 2U12 U1 U2 ¡ U12 U22 : CHAPTER 4. MULTIVARIATE CALCULUS 235 Proof. Using the multivariate chain rule on @U (Q1 ; f (Q1 )) @U (Q1 ; f (Q1 )) 0 + f (Q1 ) = 0 @Q1 @Q2 we …nd that: U11 + 2U12 f 0 (Q1 ) + U22 f 0 (Q1 )2 + U2 f 00 (Q1 ) = 0: 1 Substituting f 0 (Q1 ) = ¡ U U2 we obtain: U1 2 U11 ¡ 2U12 + U22 U2 µ U1 U2 ¶2 + U2 f 00 (Q1 ) = 0 from which it follows that: à µ ¶2 ! 1 U1 U 1 2 f 00 (Q1 ) = ¡ + U22 U11 ¡ 2U12 U2 U2 U2 ¢ 1 ¡ 2 ¡U22 U11 + 2U12 U1 U2 ¡ U12 U22 = 3 U2 1 det [H] : = U23 Since U2 > 0 it follows that f 00 (Q1 ) > 0 if and only if det [H] > 0: Remark 1: This matrix is sometimes referred to as the bordered Hessian. It contains the ordinary Hessian of U (Q1 ; Q2 ) in the lower right-hand corner and is bordered by the …rst derivatives on either side with a 0 in the upper left-hand corner. Example: For U (Q1 ; Q2 ) = Q21 Q42 the bordered Hessian is: 2 0 H = 4 2Q1 Q42 4Q21 Q32 2Q1 Q42 2Q42 8Q1 Q32 and (with some work) you can show that: 3 4Q21 Q32 8Q1 Q32 5 12Q21 Q22 det [H] = 48Q41 Q10 2 >0 so that, as we already knew, U (Q1 ; Q2 ) is quasi-concave. CHAPTER 4. MULTIVARIATE CALCULUS 4.6 236 Constrained Optimization Economics whether normative or positive, has not simply been the study of the allocation of scarce resources, it has been the study of the rational allocation of scarce resources. -Herbert A. Simon Typically in economics when rational agents attempt to maximize pro…ts or utility, or to minimize costs or expenditure, they are not free to choose any one of the variables they control. Instead they face some constraint that restricts the choices they can make. This is because economics is about scarcity and scarcity imposes constraints on economies and agents. For example a household maximizing utility cannot choose any bundle of goods might want, but can only choose from amongst those bundles that it can a¤ord; that is which satisfy the household’s budget constraint. This leads to a new kind of optimization problem from what we have considered so far where instead of working directly with the objective function f (x1 ; x2 ; : : : xn ) we construct a new function, the Lagrangian: L (¸; x1 ; x2 ; : : : xn ) and work with this function instead. Economists work with Lagrangians all the time. In a way it is the most important mathematical technique for you to learn if you want to go on in economics. 4.6.1 The Lagrangian Suppose we wish to maximize or minimize a multivariate function f (x1 ; x2 ; : : : xn ) subject to a constraint. The constraint is written as: g (x1 ; x2 ; : : : xn ) = 0: This means that in maximizing or minimizing f (x1 ; x2 ; : : : xn ) we can only choose those x1 ; x2 ; : : : xn which make g (x1 ; x2 ; : : : xn ) equal to zero. To do this we construct the Lagrangian, which is a function of n+1 variables: the Lagrange multiplier ¸; which is a scalar, and the n components x1 ; x2 ; : : : xn : We have: De…nition 282 Corresponding to the problem of maximizing or minimizing the objective function: f (x1 ; x2 ; : : : xn ) CHAPTER 4. MULTIVARIATE CALCULUS 237 subject to the constraint g (x1 ; x2 ; : : : xn ) = 0 is the Lagrangian given by: L (¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸g (x1 ; x2 ; : : : xn ) : At the beginning students often make mistakes in constructing the Lagrangian. To avoid these errors consider the following step-by-step recipe: A Recipe for Constructing the Lagrangian 1. Identify the objective function, the function to be maximized or minimized: f (x1 ; x2 ; : : : xn ) : 2. Identify the constraint and, if necessary, rewrite the constraint in the form of ______ = 0: 3. Write the Lagrangian function using L with the …rst argument the Lagrange multiplier ¸ followed by the xi 0 s . We thus write: L (¸; x1 ; x2 ; : : : xn ) = 4. After the equality sign in 3: write the objective function: f (x1 ; x2 ; : : : xn ) followed by +¸ and then brackets as: L (¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸ ( ): 5. Inside the brackets in 4: put the expression to on left-hand side of the constraint written as ___ = 0 as: 1 0 C B (¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸ @ _________ A {z } | left-side of _=0 from 2: which then gives the Lagrangian. Example 1: Consider the problem of minimizing 1 x21 + x22 2 subject to the constraint that x1 and x2 sum to 1 or that: x1 + x2 = 1: CHAPTER 4. MULTIVARIATE CALCULUS 238 1. We …rst identify what is the constraint and what is to be maximized. Here a typical error would be to confuse: x1 + x2 ; which forms part of the constraint, with: x21 + 12 x22 which is the objective function. The objective function is what we wish to minimize: 1 f (x1 ; x2 ) = x21 + x22 : 2 2. The constraint is that x1 + x2 = 1: We need to rewrite this as: ___ = 0: This is easily done by putting x1 + x2 on the other side of the equal sign as: x1 + x2 = 1 =) 1 ¡ x1 ¡ x2 = 0 so that g (x1 ; x2 ) is given by: g (x1 ; x2 ) = 1 ¡ x1 ¡ x2 = 0: 3. Here the Lagrangian is a function of ¸ and x1 and x2 so we write: L (¸; x1 ; x2 ) = : 4. After the equal sign in 3: we write the objective function from 1 followed by +¸ ( ) as: L (¸; x1 ; x2 ) = 1 x21 + x22 | {z2 } + ¸ (_____) : ob jective function 5. Inside the brackets we place the left-hand side of the constraint written as ___ = 0: Thus from 2: we have: 1 0 1 L (¸; x1 ; x2 ) = x21 + x22 + ¸ @1 ¡ x1 ¡ x2 A : | {z } 2 from 2 Example 2: Suppose a household wishes to maximize utility: U (Q1 ; Q2 ) : where Q1 and Q2 are the amounts of good 1 and good 2 that the household consumes. The household has income Y , the price of Q1 is P1 and the price of Q2 is P2 so that the budget constraint is: Y = P1 Q1 + P2 Q2 : CHAPTER 4. MULTIVARIATE CALCULUS 239 We need to rewrite this as g (Q1 ; Q2 ) = 0: This can be done in a number of ways. Here we will use: Y = P1 Q1 + P2 Q2 =) Y ¡ P1 Q1 ¡ P2 Q2 = 0 so that the constraint is: g (Q1 ; Q2 ) = Y ¡ P1 Q1 ¡ P2 Q2 = 0: The Lagrangian is therefore: L (¸; Q1 ; Q2 ) = U (Q1 ; Q2 ) | {z } ob jective function 1 0 + ¸ @Y ¡ P1 Q1 ¡ P2 Q2 A : | {z } constraint Example 3: Suppose now that we have the particular utility function: U (Q1 ; Q2 ) = 0:3 ln (Q1 ) + 0:7 ln (Q2 ) and as above the budget constraint is: Y = P1 Q1 + P2 Q2 or: g (Q1 ; Q2 ) = Y ¡ P1 Q1 ¡ P2 Q2 = 0: The Lagrangian is therefore: 1 0 C B L (¸; Q1 ; Q2 ) = 0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ @Y ¡ P1 Q1 ¡ P2 Q2 A : | {z } {z } | =g(Q1 ;Q2 ) =U(Q1 ;Q2 ) Example 4: Suppose a …rm has a Cobb-Douglas production function: 1 3 Q = F (L; K) = L 2 K 4 : The …rm’s objective is to minimize the cost of producing Q units. Thus the objective function is costs: W L + RK where W is the wage and R the rental cost of capital. The constraint is that L and K must produce Q units of output (otherwise L = K = 0 minimizes costs!) so that Q = F (L; K) is the constraint. Using: 1 3 Q = F (L; K) =) Q ¡ L 2 K 4 = 0 CHAPTER 4. MULTIVARIATE CALCULUS 240 we can rewrite the constraint as g (L; K) = 0 where: 1 3 g (L; K) = Q ¡ L 2 K 4 = 0: We therefore have the Lagrangian for cost minimization as: 1 0 1 3 L (¸; L; K) = W + RK} + ¸ @Q ¡ L 2 K 4 A : | L {z | {z } ob jective constraint Example 5: Now consider the more general problem of a …rm with a production function Q = F (L; K) wishes to minimize cost of producing Q units. The Lagrangian is then: 1 0 L (¸; L; K) = W + RK} + ¸ @Q ¡ F (L; K)A : | L {z {z } | ob jective 4.6.2 constraint First-Order Conditions For constrained optimization we have, just as before, …rst-order conditions. Now however the relevant …rst-order conditions are not with respect to f (x1 ; x2 ; : : : xn ) but with respect to the Lagrangian: L (¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸g (x1 ; x2 ; : : : xn ) : We have: Theorem 283 Suppose x¤1 ; x¤2 ; : : : x¤n either maximizes or minimizes the objective function: f (x1 ; x2 ; : : : xn ) subject to the constraint g (x1 ; x2 ; : : : xn ) = 0: Then there is a ¸¤ such that: @L (¸¤ ; x¤1 ; x¤2 ; : : : x¤n ) = g (x¤1 ; x¤2 ; : : : x¤n ) = 0 @¸ @L (¸¤ ; x¤1 ; x¤2 ; : : : x¤n ) @f (x¤1 ; x¤2 ; : : : x¤n ) @g (x¤1 ; x¤2 ; : : : x¤n ) = + ¸¤ = 0; @xi @xi @xi i = 1; 2; : : : n: Remark 1: There are n+1 …rst-order conditions leading to n+1 equations in n+ 1 unknowns: ¸¤ ; x¤1 ; x¤2 ; : : : x¤n : Since there are as many equations as unknowns, it should be possible to solve them for ¸¤ ; x¤1 ; x¤2 ; : : : x¤n : Remark 2: It is important that the Lagrange multiplier ¸ also have a ¤ : To solve the …rst-order conditions you must solve for ¸¤ : Thus in essence then there is no di¤erence between the treatment of the xi 0 s and ¸: CHAPTER 4. MULTIVARIATE CALCULUS 241 Remark 3: The …rst of the …rst-order conditions: @L (¸¤ ; x¤1 ; x¤2 ; : : : x¤n ) = g (x¤1 ; x¤2 ; : : : x¤n ) = 0 @¸ insures that x¤1 ; x¤2 ; : : : x¤n satis…es the constraint: Example 1 (continued): From the Lagrangian: 1 L (¸; x1 ; x2 ) = x21 + x22 + ¸ (1 ¡ x1 ¡ x2 ) 2 @L we need to calculate three partial derivatives: @L @¸ ; @x1 and @L @¸ = = @L @x1 = = @L @x1 = = Putting a ¤ @L @x2 : We thus have: ¶ µ @ 1 x21 + x22 + ¸ (1 ¡ x1 ¡ x2 ) @¸ 2 1 ¡ x1 ¡ x2 ¶ µ @ 1 x21 + x22 + ¸ (1 ¡ x1 ¡ x2 ) @x1 2 2x1 ¡ ¸ ¶ µ @ 1 x21 + x22 + ¸ (1 ¡ x1 ¡ x2 ) @x1 2 x2 ¡ ¸: on ¸; x1 x2 and setting the derivatives equal to zero we obtain: 1 ¡ x¤1 ¡ x¤2 2x¤1 ¡ ¸¤ x¤2 ¡ ¸¤ = 0 =) x¤1 + x¤2 = 1 1 = 0 =) x¤1 = ¸¤ 2 = 0 =) x¤2 = ¸¤ : Using the …rst result and adding up the second and third results we have: 1 1 3 x¤1 + x¤2 = ¸¤ + ¸¤ = ¸¤ 2 2 3 ¤ =) 1 = ¸ 2 2 ¤ =) ¸ = 3 = Now that we have ¸¤ we can solve for x¤1 and x¤2 as: x¤1 x¤2 1 1 ¤ 1 2 ¸ = £ = 2 2 3 3 2 ¤ = ¸ = : 3 = Thus the solution is ¸¤ = 23 ; x¤1 = 1 3 and x¤2 = 23 : CHAPTER 4. MULTIVARIATE CALCULUS 242 Example 2 (continued): For the utility maximization problem we obtained the Lagrangian: L (¸; Q1 ; Q2 ) = U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ) : @L We need to calculate three partial derivatives: @L @¸ ; @Q1 and @L @¸ = = @L @Q1 = = @L @Q2 = = @L @Q2 as: @ (U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 )) @¸ Y ¡ P1 Q1 ¡ P2 Q2 @ (U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 )) @Q1 @U (Q1 ; Q2 ) ¡ ¸P1 @Q1 @ (U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 )) @x1 @U (Q1 ; Q2 ) ¡ ¸P2 : @Q2 Setting ¸ = ¸¤ ; Q1 = Q¤1 ; Q2 = Q¤2 and setting the partial derivatives equal to zero we obtain three …rst-order conditions: Y ¡ P1 Q¤1 ¡ P2 Q¤2 @U (Q¤1 ; Q¤2 ) ¡ ¸¤ P1 @Q1 @U (Q¤1 ; Q¤2 ) ¡ ¸¤ P2 @Q2 = 0 = 0 = 0: Note that the …rst condition insures that: P1 Q¤1 + P2 Q¤2 = Y so that Q¤1 and Q¤2 satisfy the budget constraint. Since we have made no assumptions about U (Q1 ; Q2 ) we cannot hope to solve these three equations directly for ¸¤ ; Q¤1 ; Q¤2 : We can however use these equations to learn something about the nature of the optimal decision rule for the household. From the second and third of the …rst-order conditions we have: @U (Q¤1 ; Q¤2 ) ¡ ¸¤ P1 @Q1 = =) 0 =) M U1 (Q¤1 ; Q¤2 ) = ¸¤ P1 MU1 (Q¤1 ; Q¤2 ) = ¸¤ P1 and @U (Q¤1 ; Q¤2 ) ¡ ¸¤ P2 @Q2 = =) 0 =) M U2 (Q¤1 ; Q¤2 ) = ¸¤ P2 MU2 (Q¤1 ; Q¤2 ) = ¸¤ : P2 CHAPTER 4. MULTIVARIATE CALCULUS 243 From these two results we conclude that: M U2 (Q¤1 ; Q¤2 ) M U1 (Q¤1 ; Q¤2 ) = = ¸¤ : P1 P2 This says that a rational household will allocate its income Y between Q1 and Q2 so as to equate the ratio of each good’s marginal utility to its price. This is the familiar condition from introductory economics. In introductory however MU2 1 one does not answer the question, what are MU P1 and P2 are equal to? The ¤ answer is ¸ ; the Lagrange multiplier. Later you will learn that ¸¤ is in fact the marginal utility of income. Example 3 (continued): For the Lagrangian: L (¸; Q1 ; Q2 ) = 0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ) we need to calculate three partial derivatives: @L @¸ = = @L @Q1 = = @L @Q2 = = @L @L @¸ ; @Q1 and @L @Q2 as: @ (0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 )) @¸ Y ¡ P1 Q1 ¡ P2 Q2 @ (0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 )) @Q1 0:3 ¡ ¸P1 Q1 @ (0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 )) @Q2 0:7 ¡ ¸P2 : Q2 Setting ¸ = ¸¤ ; Q1 = Q¤1 ; Q2 = Q¤2 and setting the partial derivatives equal to zero we obtain three …rst-order conditions:three …rst-order conditions: Y ¡ P1 Q¤1 ¡ P2 Q¤2 0:3 ¡ ¸¤ P1 Q¤1 0:7 ¡ ¸¤ P2 Q¤2 = 0 = 0 = 0: We have three equations with three unknowns. To solve them take the second and third equations to obtain: 0:3 ¡ ¸¤ P1 Q¤1 0:7 ¡ ¸¤ P2 Q¤2 0:3 ¸¤ P1 0:7 = 0 =) Q¤2 = ¤ : ¸ P2 = 0 =) Q¤1 = CHAPTER 4. MULTIVARIATE CALCULUS 244 We are not yet at the solution because we still do not know ¸¤ : Substituting these two into the budget constraint we obtain: µ ¶ µ ¶ 0:3 0:7 Y = P1 Q¤1 + P2 Q¤2 = P1 + P 2 ¸¤ P1 ¸¤ P2 0:3 0:7 = + ¤ ¸¤ ¸ 1 = : ¸¤ From this it follows that: ¸¤ = Y1 : This says that the marginal utility of income decreases with income, or richer people get less utility out of an extra dollar than poorer people. We then have: Q¤1 = Q¤2 = 0:3 0:3 0:3Y = 1 = and ¸¤ P1 P1 P Y 1 0:7 0:7 0:7Y = 1 = ¸¤ P2 P2 P Y 2 ¤ so that Q¤1 = 0:3Y P1 is the demand curve for good 1 and Q2 = curve for good 2: 0:7Y P2 is the demand Example 4 (continued): From the Lagrangian: ´ ³ 1 3 L (¸; L; K) = W L + RK + ¸ Q ¡ L 2 K 4 we obtain the …rst-order conditions: @L (¸¤ ; L¤ ; K ¤ ) @¸ @L (¸¤ ; L¤ ; K ¤ ) @L @L (¸¤ ; L¤ ; K ¤ ) @K 1 3 = Q ¡ L¤ 2 K ¤ 4 = 0 1 3 1 = W ¡ ¸¤ L¤¡ 2 K ¤ 4 = 0 2 3 1 1 = R ¡ ¸¤ L¤ 2 K ¤¡ 4 = 0: 4 Using the ln ( ) function we can convert these into a system of 3 linear equations in 3 unknowns as: 1 3 1 3 Q ¡ L¤ 2 K ¤ 4 = 0 =) ln (L¤ ) + ln (K ¤ ) = ln (Q) 2 4 1 ¤ ¤¡ 1 ¤ 3 1 3 ¤ W ¡ ¸ L 2 K 4 = 0 =) ln (¸ ) ¡ ln (L¤ ) + ln (K ¤ ) = ln (2W ) 2 2 4 µ ¶ 1 1 4 3 ¤ ¤ 1 ¤¡ 1 ¤ ¤ R R ¡ ¸ L 2 K 4 = 0 =) ln (¸ ) + ln (L ) ¡ ln (K ¤ ) = ln 4 2 4 3 which can be written in matrix notation as: 2 3 2 32 3 3 1 ln (Q) 0 ln (¸¤ ) 2 4 3 54 4 1 ¡1 5 ln (L¤ ) 5 = 4 ln (2W 2 4 ¡ 4 ¢) : 1 1 ¤ ln 3 R ln (K ) ¡4 1 2 CHAPTER 4. MULTIVARIATE CALCULUS 245 You can verify that 2 1 0 2 det 4 1 ¡ 12 1 1 2 3 4 3 4 ¡ 14 3 5 = 5: 4 Using Cramer’s rule we then …nd that: 2 3 3 1 ln (Q) 2 4 3 5 1 det 4 ln (2W 4 ¡ 4 ¢) ¡ 21 µ ¶ 1 ¡4 ln 3 R 2 3 4 1 ¤ 2 ln (¸ ) = R = ¡ ln (Q) + ln (2W ) + ln 5 5 5 5 3 4 2 3 3 0 ln (Q) 4 1 5 det 4 1 ln (2W 4 ¡ 4 ¢) µ ¶ 1 ln 3 R ¡ 34 3 3 4 4 ¤ ln (L ) = R = ln (Q) ¡ ln (2W ) + ln 5 5 5 5 3 4 2 3 1 0 ln (Q) 2 5 det 4 1 ¡ 12 ln (2W ¡ 4 ¢) µ ¶ 1 ln R 1 2 2 4 4 2 3 ln (K ¤ ) = ln (Q) + ln (2W ) ¡ ln R = 5 5 5 5 3 4 from which it follows that: ¤ ¸ L¤ K¤ ¶ 35 4 R = Q (2W ) 3 µ ¶ 35 3 4 4 R = Q 5 (2W )¡ 5 3 µ ¶¡ 25 2 4 4 R = Q 5 (2W ) 5 : 3 ¡ 15 2 5 µ From these we can work out the …rm’s cost function as: C ¤ (Q; W; R) = W L¤ + RK ¤ õ ¶ 3 µ ¶ 2 ! 4 2 3 2 5 3 5 = + Q5 W 5 R5 : 3 2 Let us now note some patterns that are generally true. Note that the Lagrange multiplier ¸¤ turns out to be marginal cost; that is: 2 @C ¤ (Q; W; R) 1 = ¸¤ = Q¡ 5 (2W ) 5 @Q µ 4 R 3 ¶ 35 : The fact that marginal cost falls with Q re‡ects the increasing returns to scale of this technology. L¤ and K ¤ are the conditional factor demands for L and K; CHAPTER 4. MULTIVARIATE CALCULUS 246 that is conditional on the …rm producing an output level Q this is the optimal (cost minimizing) amount of labour and capital that they would demand. Note L¤ and K ¤ here are not the same as the ordinary demand and supply curves for labour which are based on pro…t maximization and which have arguments P; W and R and not Q; W and R as here. It is also the case that: @C ¤ (Q; W; R) @W @C ¤ (Q; W; R) @R ¶ 35 4 = L = Q (2W ) R 3 µ ¶¡ 25 2 4 4 = K ¤ = Q 5 (2W ) 5 R : 3 ¤ 4 5 ¡ 35 µ These two results are examples of Shephard’s lemma. Example 5 (continued): Given the Lagrangian from the cost minimization problem: L (¸; L; K) = W L + RK + ¸ (Q ¡ F (L; K)) we have the …rst-order conditions for cost minimization: @L (¸¤ ; L¤ ; K ¤ ) @¸ @L (¸¤ ; L¤ ; K ¤ ) @L @L (¸¤ ; L¤ ; K ¤ ) @K = Q ¡ F (L¤ ; K ¤ ) = 0 @F (L¤ ; K ¤ ) =0 @L @F (L¤ ; K ¤ ) = R ¡ ¸¤ =0 @K = W ¡ ¸¤ The …rst condition insures that: Q = F (L¤ ; K ¤ ) so that L¤ and K ¤ produce Q units of output. From the second and third conditions, and recalling that @F (L¤ ; K ¤ ) @F (L¤ ; K ¤ ) = M PL (L¤ ; K ¤ ) ; = M PK (L¤ ; K ¤ ) @L @K it follows from the second and third …rst-order conditions that: M PK (L¤ ; K ¤ ) M PL (L¤ ; K ¤ ) 1 = : ¤ = ¸ W R 4.6.3 Second-Order Conditions As with unconstrained optimization any solution to the …rst-order conditions can be either a maximum or a minimum. We can determine if ¸¤ ; x¤1 ; x¤2 ; : : : x¤n is a local maximum or minimum by examining the Hessian of the Lagrangian CHAPTER 4. MULTIVARIATE CALCULUS 247 given by: 2 6 6 6 6 H (¸; x1 ; x2 ; : : : xn ) = 6 6 6 4 2 6 6 6 6 = 6 6 6 6 4 @2 L @¸2 @2 L @¸@x1 @2 L @¸@x2 @2 L @¸@x1 @2 L @x21 @2 L @x1 @x2 .. . .. . 2 2 @ L @¸@xn @ 2 f (x) @x21 @ 2 f (x) @x1 @x2 .. . @g(x) @xn .. . 2 @ L @x1 @xn 0 @g(x) @x1 @g(x) @x2 @2L @¸@x2 @2L @x1 @x2 @2L @x22 @ 2 f (x) @x1 @xn @ L @x2 @xn @g(x) @x1 2 g(x) + ¸ @ @x 2 2 1 @ g(x) + ¸ @x 1 @x2 .. . @ 2 g(x) + ¸ @x 1 @xn ¢¢¢ ¢¢¢ ¢¢¢ .. . ¢¢¢ @2L @¸@xn @2L @x1 @xn @2L @x2 @xn .. . 2 @ L @x2n @ 2 f (x) @x1 @x2 @ 2 f (x) @x22 @ 2 f (x) @x2 @xn @g(x) @x2 3 7 7 7 7 7 7 7 5 2 @ g(x) + ¸ @x 1 @x2 2 g(x) + ¸ @ @x 2 2 .. . @ 2 g(x) + ¸ @x 2 @xn ¢¢¢ ¢¢¢ ¢¢¢ .. . ¢¢¢ Remark 1: Note the zero in the upper left-hand corner of H (¸; x1 ; x2 ; : : : xn ) : This occurs because the Lagrangian is a linear function of ¸ so that: L (¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸g (x1 ; x2 ; : : : xn ) @L (¸; x1 ; x2 ; : : : xn ) = g (x1 ; x2 ; : : : xn ) =) @¸ @ 2 L (¸; x1 ; x2 ; : : : xn ) @ g (x1 ; x2 ; : : : xn ) = 0: =) = 2 @¸ @¸ Remark 2: Note that the partial derivatives @g(x) @xi i = 1; 2; : : : n of the constraint function along the border of the Hessian. For this reason H (¸; x) is sometimes referred to as the bordered Hessian. Remark 3: Since neither a positive de…nite nor a negative de…nite matrix can have a 0 along the diagonal, it follows that L (¸; x1 ; x2 ; : : : xn ) is neither concave nor convex. It follows that the second-order conditions cannot be the same as with unconstrained optimization. Another way of seeing this point is that for the …rst leading principal minor M1 = 0 always. Thus M1 tells us nothing about whether we have a maximum or a minimum. Since the …rst diagonal element of the Hessian is 0, it follows that L (¸; x1 ; x2 ; : : : xn ) is neither concave nor convex and so the second-order conditions cannot be the same as with unconstrained optimization. This point is reinforced by the fact that the second leading principal minor is µ ¶2 @g (x1 ; x2 ; : : : xn ) <0 M2 = ¡ @x1 @ 2 f (x) @x1 @xn @ 2 f (x) @x2 @xn @ 2 f(x) @x2n 3 @g(x) @xn @ 2 g(x) 7 7 + ¸ @x 1 @xn 7 @ 2 g(x) 7 + ¸ @x2 @xn 7 : 7 7 .. 7 . 5 2 g(x) + ¸ @ @x 2 n CHAPTER 4. MULTIVARIATE CALCULUS 248 and so this also tells us nothing about whether ¸¤ ; x¤1 ; x¤2 ; : : : x¤n corresponds to a maximum or a minimum. It is only at the third leading principal minor M3 where the Hessian begins to tell us something about we have a maximum or a minimum. In particular let M3 ; M4 ; M5 ; : : : be the leading principal minors of H (¸¤ ; x¤1 ; x¤2 ; : : : x¤n ) : We then have: Theorem 284 Suppose that ¸¤ ; x¤1 ; x¤2 ; : : : x¤n satisfy the …rst-order conditions from the Lagrangian and that the leading principal minors of the Hessian H (¸¤ ; x¤1 ; x¤2 ; : : : x¤n ) satisfy: M3 > 0; M4 < 0 ; M5 > 0 ¢ ¢ ¢ then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n corresponds to a constrained local maximum. Theorem 285 Suppose that ¸¤ ; x¤1 ; x¤2 ; : : : x¤n satisfy the …rst-order conditions from the Lagrangian and that the leading principal minors of the Hessian H (¸¤ ; x¤1 ; x¤2 ; : : : x¤n ) satisfy: M3 < 0; M4 < 0 ; M5 < 0 ¢ ¢ ¢ then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n corresponds to a constrained local minimum. Remark: Evaluating the leading principal minors of bordered Hessians is often a tedious business. Most if not all of the examples that we will consider involve the case where there are n = 2 independent variables or xi 0 s so that the Hessian is a 3 £ 3 matrix. In this case one need only calculate the determinant of the Hessian itself and check that it is positive for a maximum or negative for a minimum. In particular we have: Theorem 286 If n = 2 then the solution to the …rst-order conditions: ¸¤ ; x¤1 ; x¤2 represents a local constrained maximum if M3 > 0 and a local constrained minimum if: M3 < 0: Example 1 (continued): From the Lagrangian: 1 L (¸; x1 ; x2 ) = x21 + x22 + ¸ (1 ¡ x1 ¡ x2 ) 2 CHAPTER 4. MULTIVARIATE CALCULUS 249 the Hessian is calculated from the second derivatives of the Lagrangian as: @ 2 L (¸; x1 ; x2 ) @¸2 2 @ L (¸; x1 ; x2 ) @¸@x1 2 @ L (¸; x1 ; x2 ) @¸@x2 @ 2 L (¸; x1 ; x2 ) @x21 @ 2 L (¸; x1 ; x2 ) @x1 @x2 @ 2 L (¸; x1 ; x2 ) @x22 @ (1 ¡ x1 ¡ x2 ) = 0 @¸ @ (1 ¡ x1 ¡ x2 ) = ¡1 @x1 @ (1 ¡ x1 ¡ x2 ) = ¡1 @x2 @ (2x1 ¡ ¸) = 2 @x1 @ (2x1 ¡ ¸) = 0 @x2 @ (x2 ¡ ¸) = 1 @x2 = = = = = = so that the Hessian is given by: 2 3 0 ¡1 ¡1 2 0 5: H (¸¤ ; x¤1 ; x¤2 ) = 4 ¡1 ¡1 0 1 Note that the Hessian for this problem does not depend on ¸; x1 and x2 . The second-order conditions for a minimum are then satis…ed since: 2 3 0 ¡1 ¡1 2 0 5 M3 = det [H (¸; x1 ; x2 )] = det 4 ¡1 ¡1 0 1 = ¡3 < 0: Example 2 (continued): For the utility maximization problem with the Lagrangian: L (¸; Q1 ; Q2 ) = U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ) the Hessian is given by: where: U11 ´ 2 0 H (¸ ; Q¤1 ; Q¤2 ) = 4 ¡P1 ¡P2 ¤ ¡P1 U11 U12 3 ¡P2 U12 5 U22 @ 2 U (Q¤1 ; Q¤2 ) @ 2 U (Q¤1 ; Q¤2 ) @ 2 U (Q¤1 ; Q¤2 ) ; U12 ´ ; U22 ´ : 2 @Q1 @Q1 @Q2 @Q22 CHAPTER 4. MULTIVARIATE CALCULUS 250 In order for ¸¤ ; Q¤1 ; Q¤2 to be a utility maximum (and not a minimum!) we require: M3 = det [H (¸¤ ; Q¤1 ; Q¤2 )] = ¡P12 U22 + 2P1 P2 U12 ¡ P22 U11 > 0: This condition requires that the household’s indi¤erence curve be convex at ¸¤ ; Q¤1 ; Q¤2 so that there is a local diminishing marginal rate of substitution. Example 3 (continued): The Hessian of the Lagrangian: L (¸; Q1 ; Q2 ) = 0:3 ln (Q1 ) + 0:7 ln (Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ) at ¸¤ ; Q¤1 ; Q¤2 is given by: 2 0 6 ¡P1 ¤ ¤ ¤ H (¸ ; Q1 ; Q2 ) = 4 ¡P2 ¡P1 ¡ 0:3 2 (Q¤1 ) 0 ¡P2 0 ¡ 0:7 2 (Q¤2 ) 3 7 5: Since the Hessian is a 3 £ 3 matrix, we only have to calculate the determinant of H (¸¤ ; Q¤1 ; Q¤2 ) and verify that it is positive to show that ¸¤ ; Q¤1 ; Q¤2 correspond to a local maximum. Thus: M3 = det [H (¸¤ ; Q¤1 ; Q¤2 )] = 0:7P12 (Q¤2 )2 + 0:3P22 (Q¤1 )2 >0 and the second-order conditions for a local maximum are satis…ed. Example 4 (continued): From the Lagrangian: ´ ³ 1 3 L (¸; L; K) = W L + RK + ¸ Q ¡ L 2 K 4 the Hessian is given by: 2 1 0 H (¸; L; K) = 4 ¡ 12 L¡ 2 K 4 1 1 ¡ 34 L 2 K ¡ 4 1 3 3 ¡ 12 L¡ 2 K 4 3 1 ¡ 32 4 4 ¸L 1 K 1 3 ¡2 ¡4 ¡ 8 ¸L K 3 1 1 ¡ 34 L 2 K ¡ 4 1 1 ¡ 38 ¸L¡ 2 K ¡ 4 5 : 1 5 3 ¡4 2 16 ¸L K With some straightforward work it can be shown that: M3 = det [H (¸¤ ; L¤ ; K ¤ )] = ¡ 15 ¤ ¤¡ 1 ¤ 1 ¸ L 2K 4 < 0 32 (recall from the solution to the …rst-order conditions that ¸¤ > 0) and so ¸¤ ; L¤ ; K ¤ corresponds to a local minimum as required. Example 5 (continued): Given the Lagrangian L (¸; L; K) = W L + RK + ¸ (Q ¡ F (L; K)) CHAPTER 4. MULTIVARIATE CALCULUS 251 the Hessian is given by: 2 0 6 H (¸; L; K) = 4 ¡ @F (L;K) @L (L;K) ¡ @F@K ¡ @F (L;K) @L 2 (L;K) ¡¸ @ F@L 2 2 F (L;K) ¡¸ @ @L@K To make the notation more compact de…ne FL ´ FLL ´ @F (L¤ ; K ¤ ) @F (L¤ ; K ¤ ) ; FK ´ ; @L @K @ 2 F (L¤ ; K ¤ ) @ 2 F (L¤ ; K ¤ ) @ 2 F (L¤ ; K ¤ ) : ; FKK ´ and FLK ´ 2 2 @L @K @L@K The second-order conditions then require that: 2 0 ¡FL det [H (¸¤ ; L¤ ; K ¤ )] = det 4 ¡FL ¡¸¤ FLL ¡FK ¡¸¤ FLK or 3 (L;K) ¡ @F@K 2 F (L;K) 7 ¡¸ @ @L@K 5: 2 @ F (L;K) ¡¸ @K 2 3 ¡FK ¡¸¤ FLK 5 < 0 ¡¸¤ FKK ¡ ¢ 2 FLL < 0: ¸¤ FL2 FKK ¡ 2FL FK FLK + FK As with utility maximization, this condition requires that the isoquant be bent towards the origin. 4.6.4 Su¢cient Conditions for a Global Maximum or Minimum The second order conditions we have examined only guarantee a local constrained maximum or minimum; they do not guarantee that ¸¤ ; x¤1 ; x¤2 ; : : : x¤n will correspond to a global maximum or minimum. Like unconstrained optimization, we can insure that ¸¤ ; x¤1 ; x¤2 ; : : : x¤n is a global maximum or minimum by appealing to quasi-concavity or quasi-convexity, but now we need to examine the properties of both the objective function f (x1 ; x2 ; : : : xn ) and the constraint function g (x1 ; x2 ; : : : xn ) ; as well as the sign of the Lagrange multiplier ¸¤ : Almost all of the problems in economics one encounters at the intermediate level involve either a linear objective function f (x1 ; x2 ; : : : xn ) ; as in cost minimization, or a linear constraint function g (x1 ; x2 ; : : : xn ) ; as in utility maximization. For these cases we can use the following results: Theorem 287 If 1) f (x1 ; x2 ; : : : xn ) is quasi-concave, 2) the constraint is linear, that is it can be written as: g (x1 ; x2 ; : : : xn ) = a ¡ b1 x1 ¡ b2 x2 ¡ ¢ ¢ ¢ ¡ bn xn and if 3) ¸¤ > 0; then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n corresponds to a constrained global maximum. CHAPTER 4. MULTIVARIATE CALCULUS 252 Theorem 288 If 1) f (x1 ; x2 ; : : : xn ) is quasi-convex, 2) the constraint is linear, that is it can be written as: g (x1 ; x2 ; : : : xn ) = a ¡ b1 x1 ¡ b2 x2 ¡ ¢ ¢ ¢ ¡ bn xn and if 3) ¸¤ > 0; then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n corresponds to a constrained global minimum. Remark: In addition to requiring that the constraint be linear, note that we need to insure that the Lagrange multiplier ¸¤ > 0 or ¸¤ is positive. If you …nd that ¸¤ < 0 this might be because of the way that you wrote down the constraint. For example with utility maximization if you wrote the constraint as: g (Q1 ; Q2 ) = P1 Q1 + P2 Q2 ¡ Y = 0 you would obtain ¸¤ < 0 while if instead you used: g (Q1 ; Q2 ) = Y ¡ P1 Q1 ¡ P2 Q2 = 0 you would obtain ¸¤ > 0: Thus if you …nd ¸¤ < 0 you may be able to …x this problem by rewriting the constraint. Example 1 (continued): Consider the problem of minimizing: 1 f (x1 ; x2 ) = x21 + x22 2 subject to the constraint: g (x1 ; x2 ) = 1 ¡ x1 ¡ x2 = 0: We …rst show that f (x1 ; x2 ) is convex and hence that it is quasi-convex. This follows since its Hessian is given by: · ¸ 2 0 H (x1 ; x2 ) = 0 1 which is positive de…nite for all (x1 ; x2 ) : The second condition, that the constraint is linear is obviously satis…ed. Finally we showed that: ¸¤ = 2 >0 3 so the third condition is also satis…ed. It follows then that: x¤1 = is the global minimum for all x1 and x2 which satisfy: g (x1 ; x2 ) = 1 ¡ x1 ¡ x2 = 0 1 3 and x¤2 = 2 3 CHAPTER 4. MULTIVARIATE CALCULUS 253 or: x1 + x2 = 1: Example 2 (continued): Consider utility maximization where: L (¸; Q1 ; Q2 ) = U (Q1 ; Q2 ) + ¸ (Y ¡ P1 Q1 ¡ P2 Q2 ) : and where we assume that U (Q1 ; Q2 ) is quasi-concave so that the indi¤erence curves have the correct shape. Thus the …rst requirement for a global maximum is satis…ed by assumption. It is also the case that the constraint is linear since: g (Q1 ; Q2 ) = Y ¡ P1 Q1 ¡ P2 Q2 so the second requirement is also satis…ed. Now from the …rst-order conditions we have: @U (Q¤1 ; Q¤2 ) = ¸¤ P1 : M U1 (Q¤1 ; Q¤2 ) = @Q1 Since P1 > 0 and M U1 (Q¤1 ; Q¤2 ) > 0 it follows that ¸¤ > 0 so that the third requirement for a global maximum is satis…ed. We therefore conclude that ¸¤ ; Q¤1 ; Q¤2 correspond to a global maximum. Example 3 (continued): Consider the problem of maximizing: U (Q1 ; Q2 ) = 0:3 ln (Q1 ) + 0:7 ln (Q2 ) subject to the budget constraint: g (Q1 ; Q2 ) = Y ¡ P1 Q1 ¡ P2 Q2 = 0: We …rst show that U (Q1 ; Q2 ) is concave and hence that it is quasi-concave. This follows since the Hessian of U (Q1 ; Q2 ) is: " # ¡ (Q0:3)2 0 1 H (Q1 ; Q2 ) = 0 ¡ (Q0:7 2 2) is a diagonal matrix with negative diagonal elements and hence is negative de…nite for all Q1 and Q2 : Obviously the budget constraint is linear so that the second condition for a global maximum is also satis…ed. Finally we showed that 1 >0 Y so the third condition for a global maximum is satis…ed. Thus: ¸¤ = Q¤1 = 0:3Y 0:7Y ; Q¤2 = P1 P2 corresponds to a global maximum. The other class of problems one typically encounters is where the objective function is linear and the constraint is quasi-concave or quasi-convex. In this case we have: CHAPTER 4. MULTIVARIATE CALCULUS 254 Theorem 289 If 1) f (x1 ; x2 ; : : : xn ) is linear so that it can be written as: f (x1 ; x2 ; : : : xn ) = a + b1 x1 + b2 x2 + ¢ ¢ ¢ + bn xn ; 2) the constraint function g (x1 ; x2 ; : : : xn ) is quasi-concave (quasi-convex) and if 3) ¸¤ > 0 then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n correspond to a constrained global minimum (maximum). Example 4 (continued): Consider the problem of minimizing cost: W L + RK subject to the constraint: 1 3 g (L; K) = Q ¡ L 2 K 4 = 0: Obviously the objective function is a linear function of L and K and hence the …rst condition for a global minimum is satis…ed. The constraint function g (L; K) is not convex since its Hessian is given by: · 1 ¡3 3 1 1 ¸ ¡ 38 L¡ 2 K ¡ 4 L 2K4 4 H (L; K) = 1 1 1 3 ¡ 54 2 ¡ 38 L¡ 2 K ¡ 4 16 L K which is not positive de…nite since: M2 1 9 3 ¡1 ¡ 1 L K 2 ¡ L¡1 K ¡ 2 64 64 6 ¡1 ¡ 32 = ¡ L K < 0: 64 = We can show, however, that g (L; K) is quasi-convex since: 1 3 g (L; K) = Q ¡ L 2 K 4 ¶¶ µ µ 3 1 = Q ¡ exp ¡ ¡ ln (L) + ¡ ln (K) 2 4 = r (s (L; K)) where the monotonic function is: r (x) = Q ¡ exp (¡x) ; r0 (x) = exp (¡x) > 0 and function: 3 1 s (L; K) = ¡ ln (L) + ¡ ln (K) 2 4 is convex since it has a Hessian: Hs (L; K) = · 1 2L2 0 0 3 4K 2 ¸ CHAPTER 4. MULTIVARIATE CALCULUS 255 which is positive de…nite for all L and K: Finally we showed that: 1 2 ¸¤ = Q¡ 5 (2W ) 5 µ 4 R 3 ¶ 35 >0 so that the third condition for a global minimum is satis…ed. We therefore conclude that: µ ¶ 35 2 4 ¤ ¡ 15 5 R ¸ = Q (2W ) 3 µ ¶ 35 4 4 ¡ 35 ¤ 5 R L = Q (2W ) 3 µ ¶¡ 25 2 4 4 ¤ 5 5 R K = Q (2W ) 3 correspond to a constrained global minimum. Example 5 (continued): Consider the general cost minimization problem where the objective function is cost: W L + RK and the constraint is: g (L; K) = Q ¡ F (L; K) = 0 and where we assume that F (L; K) is quasi-concave. (Assuming that F (L; K) is quasi-concave is basically equivalent to assuming that the isoquants bend towards the origin.) The objective function which is cost is obviously linear so that the …rst requirement for a global minimum is satis…ed. We now show that the constraint is quasi-convex. We have: Proof. If F (L; K) is quasi-concave then by de…nition it can be written as: F (L; K) = r (s (L; K)) where r0 (x) > 0 and s (L; K) is concave. Now since: g (L; K) = Q ¡ F (L; K) = a (b (L; K)) where the monotonic function is: a (x) = Q ¡ r (¡x) ; a0 (x) = r0 (¡x) > 0 and the convex function is: b (L; K) = ¡s (L; K) CHAPTER 4. MULTIVARIATE CALCULUS 256 since the negative of a concave function s (L; K) is convex. It follows then that g (L; K) = Q ¡ F (L; K) is quasi-convex. Finally we note from the …rst-order conditions for cost minimization that: W = ¸¤ ¤ @F (L¤ ; K ¤ ) : @L ¤ Since W > 0 and @F (L@L;K ) > 0; it follows that: ¸¤ > 0 so the third requirement for a global minimum is satis…ed. We conclude then that ¸¤ ; L¤ ; K ¤ correspond to a global minimum. Su¢cient Conditions when neither the objective function nor the constraint is linear There are cases where neither the objective function f (x1 ; x2 ; : : : xn ) nor the constraint g (x1 ; x2 ; : : : xn ) is linear. In this case we have: Theorem 290 If 1) both f (x1 ; x2 ; : : : xn ) and g (x1 ; x2 ; : : : xn ) in the Lagrangian: L (¸; x1 ; x2 ; : : : xn ) = f (x1 ; x2 ; : : : xn ) + ¸g (x1 ; x2 ; : : : xn ) are quasi-concave (quasi-convex), and 2) if ¸¤ ; x¤1 ; x¤2 ; : : : x¤n solve the …rst-order conditions from the Lagrangian for ¸¤ > 0 then ¸¤ ; x¤1 ; x¤2 ; : : : x¤n correspond to a constrained global maximum (minimum). Example: Consider a country that produces two goods Q1 and Q2 with utility function: U (Q1 ; Q2 ) = Q1 Q2 : which it wishes to maximize. This is clearly non-linear. The production possibilities curve or constraint satis…es: Q21 + Q22 = 1 which is plotted below Q2 Q1 Production Possibilities Curve : CHAPTER 4. MULTIVARIATE CALCULUS 257 The constraint can be written as: g (Q1 ; Q2 ) = 1 ¡ Q21 ¡ Q22 = 0 and is also clearly non-linear. This leads to the Lagrangian: ¢ ¡ L (¸; Q1 ; Q2 ) = Q1 Q2 + ¸ 1 ¡ Q21 ¡ Q22 : The …rst-order conditions are: @L (¸¤ ; Q¤1 ; Q¤2 ) @¸ @L (¸¤ ; Q¤1 ; Q¤2 ) @Q1 @L (¸¤ ; Q¤1 ; Q¤2 ) @Q2 ¢2 ¡ = 1 ¡ (Q¤1 )2 ¡ Q¤2 = 0 =) (Q¤1 )2 + (Q¤2 )2 = 1; 2 2 = Q¤2 ¡ 2¸¤ Q¤1 = 0 =) (Q¤2 )2 = 4 (¸¤ ) (Q¤1 )2 2 = Q¤1 ¡ 2¸¤ Q¤2 = 0 =) (Q¤1 )2 = 4 (¸¤ ) (Q¤2 )2 : Adding the second and third results we obtain: ³ ´ 1 2 (Q¤1 )2 + (Q¤2 )2 = 4 (¸¤ ) (Q¤2 )2 + (Q¤1 )2 =) ¸¤ = : 2 Using ¸¤ = 1 2 in the second condition we obtain: Q¤2 ¡ 2¸¤ Q¤1 = 0 =) Q¤2 = Q¤1 so that using Q¤2 = Q¤1 in the constraint yields: 1 2 2 (Q¤1 ) + (Q¤2 ) = 1 =) Q¤1 = Q¤2 = p : 2 Thus the solution to the …rst-order conditions is: ¸¤ = 12 ; Q¤1 = Q¤2 = We can show that f (Q1 ; Q2 ) is quasi-concave since p1 : 2 f (Q1 ; Q2 ) = eln(Q1 )+ln(Q2 ) where ex is a monotonic transformation and ln (Q1 ) + ln (Q2 ) is concave since it has a Hessian: " # ¡ Q12 0 1 Hf (Q1 ; Q2 ) = 0 ¡ Q12 2 which is a diagonal matrix with all diagonal elements negative and hence is negative de…nite for all Q1 ; Q2 : The constraint: g (Q1 ; Q2 ) is concave, and hence quasi-concave, since it has a Hessian: · ¸ ¡2 0 Hg (Q1 ; Q2 ) = 0 ¡2 CHAPTER 4. MULTIVARIATE CALCULUS 258 which is negative de…nite for all Q1 ; Q2 : Finally ¸¤ = 1 > 0: Thus f (Q1 ; Q2 ) and g (Q1 ; Q2 ) are quasi-concave and ¸¤ > 0 is satis…ed so that ¸¤ = 12 ; Q¤1 = Q¤2 = p12 is a global maximum. The constrained maximum Q¤1 = Q¤2 = p12 yields U (Q¤1 ; Q¤2 ) = 12 units of 1 is just tangent to the utility and occurs where the indi¤erence curve Q2 = 2Q 1 production possibilities curve as illustrated below: 1 0.8 0.6 Q2 0.4 0.2 0 0.5 4.7 4.7.1 0.6 0.7 Q1 0.8 0.9 1 : Econometrics Linear Regression Consider the simple linear regression model: Yi = ® + ¯Xi + ei ; i = 1; 2; : : : n: ^ are the values of ® and ¯ which minimize ^ and ¯ The least squares estimators ® the sum of squares function: S (®; ¯) = n X i=1 (Yi ¡ ® ¡ ¯Xi )2 : We have using the sum and chain rules that: @S (®; ¯) @® = ¡2 @S (®; ¯) @¯ = ¡2 n X i=1 n X i=1 (Yi ¡ ® ¡ ¯Xi ) Xi (Yi ¡ ® ¡ ¯Xi ) CHAPTER 4. MULTIVARIATE CALCULUS 259 so that the …rst-order conditions for a minimum are: ³ ´ à n ! ^ n ³ n ´ @S ® ^; ¯ X X X ^ i = 0 =) n^ ^= Yi ¡ ® ®+ = ¡2 ^ ¡ ¯X Xi ¯ Yi @® i=1 i=1 i=1 ³ ´ à n ! à n ! ^ n n ³ ´ @S ® ^; ¯ X X X X 2 ^ ^ = ¡2 ^ ¡ ¯Xi = 0 =) Xi Yi ¡ ® Xi ® Xi ¯ = Xi Yi ^+ @¯ i=1 i=1 i=1 i=1 or in matrix notation: · Pnn i=1 Xi ¸· ¸ · Pn ¸ Pn ® ^ Pni=1 X2i Pn i=1 Yi = : ^ ¯ i=1 Xi i=1 Xi Yi From the …rst equation it is easy to show that: ^ ¹¯ ® ^ = Y¹ ¡ X ^ using Cramer’s rule we …nd that: ^ Solving ¯ so the di¢culty is in obtaining ¯: Pn Pn P n i=1 Xi Yi ¡ ( i=1 Xi ) ( ni=1 Yi ) ^ ¯= : P P 2 n ni=1 Xi2 ¡ ( ni=1 Xi ) Example: Suppose one has data on the consumption of n = 4 families along with their income as: Yi = 72 58 63 55 Xi = 98 80 91 73 where Yi is the consumption and Xi is the income of family i. We wish to estimate a consumption function of the form: Yi = ® + ¯Xi + ei where ¯ is the marginal propensity to consume. The sum of squares is then: 2 2 2 2 S (®; ¯) = (72 ¡ ® ¡ 98¯) + (58 ¡ ® ¡ 80¯) + (63 ¡ ® ¡ 91¯) + (55 ¡ ® ¡ 73¯) : ^ and ® ^ we need: To calculate ¯ 4 X i=1 4 X Xi ¹ = 342 = 85:5 = 98 + 80 + 91 + 73 = 342 =) X 4 Xi2 = 982 + 802 + 912 + 732 = 29614 i=1 4 X Xi Yi i=1 4 X i=1 Yi = 98 £ 72 + 80 £ 58 + 91 £ 63 + 73 £ 55 = 21444 248 = 62 = 72 + 58 + 63 + 55 = 248 =) Y¹ = 4 CHAPTER 4. MULTIVARIATE CALCULUS 260 It follows then that: ^ = 4 £ 21444 ¡ 342 £ 248 = 0:643 ¯ 4 £ 29614 ¡ 3422 and: ^ = 62 ¡ 0:643 £ 85:5 = 7:023: ¹¯ ® ^ = Y¹ ¡ X Thus the estimated consumption function is: Y^i = 7:023 + 0:643Xi and the estimated marginal propensity to consume is 0:643: 4.7.2 Maximum Likelihood Maximum likelihood can also be applied to cases where µ is a vector of parameters so that: µ = µ 1 ; µ2 ; : : : µp so that the likelihood L (µ) = L (µ1 ; µ 2 ; : : : µp ) is a multivariate function. µ As before we estimate µ by maximizing L (µ) and denote the solution as ^ which solves the …rst-order conditions: ³ ´ @L ^µ1 ; ^ µ2 ; : : : ^ µp = 0 for j = 1; 2; : : : p @µj or equivalently if we de…ne the log-likelihood as l (µ) = ln (L (µ)) then: ³ ´ @l ^µ 1 ; ^ µ2 ; : : : ^ µp = 0 for j = 1; 2; : : : p: @µj Once ^µ is found from the …rst-order conditions, a 95% con…dence interval for µ can be found as follows. Let the p £ p matrix: ³ ´ H ^ µ µ: This is referred to as the be the Hessian of the log-likelihood evaluated at ^ information matrix. Now calculate ³ ´´¡1 ³ µ ¢ = ¡H ^ CHAPTER 4. MULTIVARIATE CALCULUS 261 and let ± j be the j th diagonal element of ¢: Then a 95% con…dence interval for the unknown µj is p ^ µj § 1:96 £ ± j : ¤ £ Example 1: Suppose that Yi » N ¹; ¾ 2 so that Yi has a mean of ¹ and a standard deviation of ¾: We wish to estimate µ1 = ¹ and µ2 = ¾ using maximum likelihood from a sample Y1 ; Y2 ; : : : ; Yn : The likelihood function is: n 2 L (¹; ¾) = (2¼)¡ 2 ¾¡n e¡ 2¾2 ((Y1 ¡¹) 1 +(Y2 ¡¹)2 +¢¢¢+(Yn ¡¹)2 ) : The log-likelihood is given by: l (¹; ¾) = ln (L (¹; ¾)) ´ 1 ³ n = ¡ ln (2¼) ¡ n ln (¾) ¡ 2 (Y1 ¡ ¹)2 + (Y2 ¡ ¹)2 + ¢ ¢ ¢ + (Yn ¡ ¹)2 : 2 2¾ ^ = Y¹ the sample mean since: The maximum likelihood estimator of ¹ is ¹ (Y1 ¡ ¹) + (Y2 ¡ ¹) + ¢ ¢ ¢ + (Yn ¡ ¹) ¾2 (Y1 ¡ ¹ @l (^ ¹; ¾ ^) ^ ) + (Y2 ¡ ¹ ^ ) + ¢ ¢ ¢ + (Yn ¡ ¹ ^) =0= =) @¹ ¾ ^2 =) Y1 + Y2 + ¢ ¢ ¢ + Yn = n^ ¹ Y1 + Y2 + ¢ ¢ ¢ + Yn = Y¹ : =) ¹ ^= n @l (¹; ¾) @¹ = The maximum likelihood estimator of ¾ is the sample standard deviation since: ´ n 1 ³ @l (¹; ¾) = ¡ + 3 (Y1 ¡ ¹)2 + (Y2 ¡ ¹)2 + ¢ ¢ ¢ + (Yn ¡ ¹)2 @¾ ¾ ¾ and: @l (^ ¹; ¾ ^) @¾ = 0 ´ 1 ³ n + 3 (Y1 ¡ ¹ ^ )2 + (Y2 ¡ ¹ ^ )2 + ¢ ¢ ¢ + (Yn ¡ ¹ ^ )2 = 0 ¾ ^ ¾ ^ ¢2 ¡ ¢2 ¢2 ´ ¡ n 1 ³¡ =) ¡ + 3 Y1 ¡ Y¹ + Y2 ¡ Y¹ + ¢ ¢ ¢ + Yn ¡ Y¹ =0 ¾ ^ ¾ ^ ¢2 ¡ ¢2 ¢2 ´ ¡ 1 ³¡ Y1 ¡ Y¹ + Y2 ¡ Y¹ + ¢ ¢ ¢ + Yn ¡ Y¹ =) ¾ ^2 = rn ³ ¢2 ¡ ¢2 ¡ ¢2 ´ 1 ¡ : =) ¾ ^= Y1 ¡ Y¹ + Y2 ¡ Y¹ + ¢ ¢ ¢ + Yn ¡ Y¹ n =) ¡ CHAPTER 4. MULTIVARIATE CALCULUS 262 ^ and ¾ ^ we need the Hessian of Now to calculate con…dence intervals for ¹ l (¹; ¾) : We have: @ 2 l (^ ¹; ¾ ^) 2 @¹ = = @ 2 l (^ ¹; ¾ ^) @¹@¾ = = = = ¡1 + ¡1 + ¢ ¢ ¢ + ¡1 ¾ ^2 n ¡ 2 ¾ ^ (Y1 ¡ ¹ ^ ) + (Y2 ¡ ¹ ^ ) + ¢ ¢ ¢ + (Yn ¡ ¹ ^) ¡2 3 ¾ ^ ¢ ¡ ¢ ¡ ¢ ¡ Y1 ¡ Y¹ + Y2 ¡ Y¹ + ¢ ¢ ¢ + Yn ¡ Y¹ ¡2 ¾ ^3 Y1 + Y2 + ¢ ¢ ¢ + Yn ¡ nY¹ ¡2 ¾ ^3 0 =n^ ¾2 2 ¹; ¾ ^) @ l (^ 2 @¾ z }| { ¢2 ¡ ¢2 ¢2 ´ ¡ 1 ³¡ n ¹ ¹ ¹ ¡ 3 4 Y1 ¡ Y + Y2 ¡ Y + ¢ ¢ ¢ + Yn ¡ Y = ¾ ^2 ¾ ^ 2n = ¡ 2 ¾ ^ and so the information matrix is: H (^ ¹; ¾ ^) = · ¡ ¾^n2 0 0 ¡ ¾2n ^2 ¸ and hence: ¢ = (¡H (^ ¹; ¾ ^ ))¡1 " 2 # ¾ ^ 0 n = : ^2 0 ¾2n Thus a 95 % con…dence interval for the unknown ¹ takes the form: s ¾ ^2 ¹ ^ § 1:96 n while a 95 % con…dence interval for the unknown ¾ takes the form: s ¾ ^2 : ¾ ^ § 1:96 2n Example 2: Suppose we are given n = 5 observations: Y1 = 5:5; Y2 = 3:3; Y3 = 7:1; Y4 = 9:2; Y5 = 4:1: CHAPTER 4. MULTIVARIATE CALCULUS 263 We are seeking the values of ¹ and ¾ which maximize the log-likelihood plotted below: ln(L) sigma mu : l (¹; ¾) We have 5:5 + 3:3 + 7:1 + 9:2 + 4:1 ¹ ^ = Y¹ = = 5:84 5 and ¾ ^ s (5:5 ¡ 5:84)2 + (3:3 ¡ 5:84)2 + (7:1 ¡ 5:84)2 + (9:2 ¡ 5:84)2 + (4:1 ¡ 5:84)2 5 = 2: 12: = A 95% con…dence interval for the unknown ¹ is then: r 2: 122 5:84 § 1:96 5 or: 5:84 § 1:8583: or: A 95% con…dence interval for the unknown ¾ is then: r 2: 122 2: 12 § 1:96 2£5 2: 12 § 1:314:
© Copyright 2026 Paperzz