Least Squares Learning Goals: find the best solution (by one measure, anyway) of inconsistent equation. Learn to apply the algebra, geometry, and calculus of projections to this problem. Example: You plant a seedling at noon on Monday. Each day after at noon you measure its height. On Tuesday it is 3cm tall. Wednesday it is 5cm tall. Thursday it is 6cm tall. How tall was it when you planted it? Let’s assume that the seedling has a constant growth rate—obviously an incorrect assumption, but what else can we assume? (Assuming an arithmetic growth rate gives a height of zero on planting day, and negative growth from here on out; geometric growth would give it a height of –1 cm on day zero.) If the growth rate is r and the initial height is h, we end up with the following three equations in two unknowns: r + h = 3, 2r + h = 5, 3r + h = 6. As usual, when there are more equations than unknowns we expect the system to be inconsistent, and reduction ⎡ 1 1⎤ ⎡ 3⎤ ⎡ 1 1 ⎤ ⎡ 3 ⎤ ⎡1 1 ⎤ ⎡3⎤ ⎡r ⎤ ⎢ ⎥ ⎢ ⎡r ⎤ ⎢ ⎥ ⎢ ⎡r ⎤ ⎢ ⎥ ⎢ ⎥ ⎥ ⎥ quickly proves that point: ⎢ 2 1⎥ ⎢ ⎥ = ⎢ 5 ⎥ → ⎢ 0 −1 ⎥ ⎢ ⎥ = ⎢ −1⎥ → ⎢ 0 −1⎥ ⎢ ⎥ = ⎢ −1⎥ . h h h ⎢⎣ 3 1⎥⎦ ⎣ ⎦ ⎢⎣ 6 ⎥⎦ ⎢⎣ 0 −2 ⎥⎦ ⎣ ⎦ ⎢⎣ −3⎥⎦ ⎢⎣ 0 0 ⎥⎦ ⎣ ⎦ ⎢⎣ −1⎥⎦ Of course, the problem is that our right-hand side, b = (3, 5, 6) is not in the column space of the matrix A, so there is no solution to Ax = b. If it were, we could solve the system with no problems. So what should we do? ⎡ 1 1⎤ ⎡r ⎤ Let’s ask how close we can come to solving the equation. ⎢⎢ 2 1⎥⎥ ⎢ ⎥ is guaranteed to be h ⎢⎣ 3 1⎥⎦ ⎣ ⎦ in the column space of the matrix. So instead of using the real b, let’s find the thing in the column space that is as close to b as possible, and solve for that instead! Let p be the projection of b into the column space. Then the error vector e = b – p is as small as possible. Let’s call the solution to this new problem x̂ so we are solving Ax̂ = p . The one thing we know about e is that it is orthogonal to the column space, so it is in the left nullspace. That is, ATe = 0. This means that AT (b − Ax̂) = 0 , or AT Ax̂ = AT b . So instead of Ax = b, we solve the normal equations AT Ax̂ = AT b . (We will show later ⎡1 2 3⎤ that this always has a solution). In this case, we multiply both sides by AT = ⎢ ⎥ to ⎣1 1 1 ⎦ ⎡14 6 ⎤ ⎡ r̂ ⎤ ⎡ 31⎤ obtain the system ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ . A little elimination shows that r̂ = 3 / 2 and ĥ = 5 / 3 . ⎣ 6 3 ⎦ ⎣ ĥ ⎦ ⎣14 ⎦ So we guess our little plant started our 5/3 cm tall and grew at a rate of 3/2 cm/day. This is clearly wrong, since it would predict heights of 19/6, 14/3, and 37/6 instead of 3, 5,and 6, so we’re off by –1/6, 1/3, and –1/6 respectively. The is, e = (–1/6, 1/3, –1/6). This is an example of a very general problem. Not just “fit a line through a bunch of data which may not be collinear” but in general “find the best solution to Ax = b if b is not in the column space of A.” And our solution method is completely general. We know that we can’t solve Ax = b unless b is in the column space. But how close can we get? Well, we must first determine how we are going to measure closeness. One thing to do is to take our x̂ that we find, and measure the error as e = b – A x̂ . How are we going to decide how bad the error is? The easiest thing to do is to minimize the length of this error vector. We can approach this several ways. Geometry Geometrically, to minimize the length of e, since A x̂ is in the column space of A, we want e to be orthogonal to this space, hence in the left nullspace. Then ATe = 0. Writing this out gives the normal equations ATA x̂ = ATb. We will see shortly why this always has a solution. Algebra We know that b is not in the column space, so we break up b = e + p, where p is the projection of b into the column space, and e is the error vector. Then we solve A x̂ = p. Well, we know that the formula for projection on the column space of A is p = A(ATA)–1ATb (assuming the columns of A are independent—more later if they’re not!). If A x̂ = p, then x̂ = (ATA)–1ATb. we get the solution above. Calculus We could even use the techniques of calculus to minimize the error (or more rightly, the square of the error). I’ll spare you the details, but we get—unsurprisingly—the same normal equations ATA x̂ = ATb. This method of finding a “best” solution is called the method of least squares. This is so because we minimize the sum of the squares of the errors. So the method is: if Ax = b is solvable, great. If not, multiply both sides by AT to obtain the normal equation and solve ATA x̂ = ATb. We know that if the columns of A are independent then ATA is invertible, and there is a unique solution to this system. What if they aren’t? It turns out that we can still always solve the system, but now the solution won’t be unique. We can add any null vector of A to a particular x̂ and obtain another solution. This will be the point of the pseudoinverse. We will pick out the particular solution that is in the row space of A. That way, any other solution will be this plus a null vector, which is orthogonal to the row space. So any other solution will be longer than the row space solution. Thus not only will the pseudoinverse make the error as small as possible, it will make the choice of x̂ as small as possible, too! So why is ATA x̂ = ATb always solvable? Well, we use our Fundamental Theorem of Linear Algebra. The column space C(ATA) is the orthogonal complement of the left nullspace of ( ) ( ⊥ ) ⊥ ATA. Well, this is easier in symbols: C(AT A) = N(AT A)T = N(AT A) = ( N(A)) = C(AT ) (we’ve seen that A and ATA have the same nullspace because if Ax = 0 certainly ATAx = 0, but if ATAx = 0, we multiply on both sides by xT and find the ||Ax|| = 0, so Ax = 0). But since the column spaces of ATA and AT are the same, and ATb is in the column space of AT we can certainly always solve ATA x̂ = ATb. The method of least squares is very applicable, not just to finding best fit lines. Later on, we may find best fit functions—the theory of Fourier series. As another example, we will find a best fit parabola. ⊥ Example Measure the acceleration due to gravity. An object is dropped. The distance it falls is measured and the following data are collected: (t, d) is given in (time (tenths of a second), distance (centimeters)): (1, 5), (2, 19), (3, 44), (4, 79), (5, 122). There may be some measurement error, the object may not have been at exactly zero when it was released, and it might have had some initial speed (for instance, it might have been hand-held, and it was hard to get the timing just right or to hold the hand perfectly steady). So the formula for distance fallen ought to be gt2/2 + vt + d, where g is the acceleration of gravity, v is its initial downward speed, and d is the distance below zero-level from which it was actually dropped. Putting in the data gives five equations in three unknowns: 1g/2 + 1v + d = 5 4g/2 + 2v + d = 19 9g/2 + 3v + d = 44 16g/2 + 4v + d = 79 25g/2 + 5v + d = 122 ⎡ 1 / 2 1 1⎤ ⎡ 5 ⎤ ⎢ 4 / 2 2 1⎥ g ⎢ 19 ⎥ ⎢ ⎥⎡ ⎤ ⎢ ⎥ Our system has the matrix form ⎢ 9 / 2 3 1⎥ ⎢⎢ v ⎥⎥ = ⎢ 44 ⎥ . Note that even if the system is ⎢ ⎥ ⎢ ⎥ ⎢16 / 2 4 1⎥ ⎢⎣ d ⎥⎦ ⎢ 79 ⎥ ⎢⎣ 25 / 5 5 1⎥⎦ ⎢⎣122 ⎥⎦ consistent the least squares method will produce the correct answer, for the projection will just be b and the error will be 0 (we’re projecting something in the column space into the column space!). So let’s not bother to check whether this is consistent (it’s not) and pass to the normal equations. Multiplying on both sides by the transpose of the coefficient matrix gives: ⎡ 979/4 225 / 2 55 / 2 ⎤ ⎡ ĝ ⎤ ⎡ 4791 / 2 ⎤ ⎢ 225 / 2 ⎥ ⎢ v̂ ⎥ = ⎢ 1101 ⎥ 55 15 ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢⎣ 55 / 2 15 5 ⎥⎦ ⎢⎣ d̂ ⎥⎦ ⎢⎣ 269 ⎥⎦ I’ll spare you the details, but the solution to this system is ( ĝ, v̂, d̂) = (68/7, 9/35, –2/5). This tells us that the object was (up to experimental error) dropped from 2/5 cm above the presumed starting point, with an initial downward velocity of 9/35 (cm/0.1 s) = 90/35 cm/s (pretty fast!) and subjected to a gravitational acceleration of 68/7 cm/(0.1 s)2 = 68/7 m/s2 or about 9.7 m/s2 Reading: 4.3 Problems: 4.3: 1, 2, 4, 5, 7, 9, 10, 12 – 16, 17, 22, 26, 27
© Copyright 2025 Paperzz