Linear algebra: systems of linear equations

XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS
Today we’re going to talk about solving systems of linear equations. These are
problems that give a couple of equations with a couple of unknowns, like:
6 = 2x1 + 3x2
7 = 4x1 + 5x2
How do we solve these things, especially when they get complicated? How do we
know when a system has a solution, and when is it unique?
Provided that the system is fairly simple, it might be easiest to solve using
successive substitution. Given a system that looks like this:
b1 = a11 x1 + a12 x2 + a13 x3
b2 = a21 x1 + a22 x2 + a23 x3
b3 = a31 x1 + a32 x2 + a33 x3
(For simplicity, most of things I show here will be 3 ! 3 systems, but everything
works just as well with more variables.) You pick any equation and any variable, and
solve in terms of that variable in terms of the constants and the other variables. Let’s
say we pick equation one and x1 :
x1 =
1
(b1 " a12 x2 " a13 x3 )
!11
Then we substitute this value of x1 back into the other two equations,
! 21
(b1 " a12 x2 " a13 x3 ) + a22 x2 + a23 x3
!11
!
b3 = 31 ( b1 " a12 x2 " a13 x3 ) + a32 x2 + a33 x3
!11
b2 =
And then we have two linear equations in two unknowns:
b2 =
#
#
! 21
! ! &
! ! &
b1 + % a22 " 12 21 ( x2 + % a23 " 13 21 ( x3
!11
!11 '
!11 '
$
$
b3 =
#
#
! 31
! ! &
! ! &
b1 + % a32 " 12 31 ( x2 + % a33 " 13 31 ( x3
!11
!11 '
!11 '
$
$
Once again, we pick one equation and solve it in terms of a particular variable:
Fall 2007 math class notes, page 90
x2 =
b2 !
a21
a11
(
b1 ! a23 !
a22 !
a13 a21
a11
)x
3
a12 a21
a11
After substituting into the remaining equation, we get a single expression for the last
of the variables:
"
x3 = $b3 !
$
#
a31
a11
b1 +
a21
a11
(
)
! a32 b2 % "
' $ a33 !
a a
a22 ! 12a1121
'$
&#
b1 +
a12 a31
a11
(
a13 a31
a11
)
(a
!
32
!
a12 a31
a11
)( a
a22 !
23
a12 a21
a11
!
a13 a21
a11
) %'
!1
'
&
Knowing what x3 is, we can find the value of x2 and then x1 . However, this is a
tiring process, especially when you start off with a bunch of equations, and there are
no apparent simple substitutions.
It’s going to be easier to do this in matrix form. Let A be the matrix of coefficients
on the system of equations, and b the constants. We can write this system of
equations as:
b1 = a11 x1 + a12 x2 + a13 x3
!b1 $ ! a11 a12
b2 = a21 x1 + a22 x2 + a23 x3 ! b = ##b2 && = ## a21 a22
#"b3 &% #" a31 a32
b3 = a31 x1 + a32 x2 + a33 x3
a13 $ ! x1 $
a23 && ## x2 && = A x
a33 &% #" x3 &%
And the question is how to solve this system for the vector x of unknowns. There
are three ways, more or less.
In the first method, we essentially use Gaussian elimination in matrix form. First,
we write out the augmented matrix:
! a11 a12
#
# a21 a22
#" a31 a32
a13 b1 $
&
a23 b2 &
a33 b3 &%
This is shorthand for saying “the vector x times the left hand side of the matrix will
equal the right hand side of the matrix.” Now, if the left-hand side equals the
identity matrix,
! 1 0 0 c1 $
#
&
# 0 1 0 c2 &
#" 0 0 1 c3 &%
Fall 2007 math class notes, page 91
what we have is that the vector x times the identity matrix (which equals x itself)
equals the right hand side, so x = c . Whenever the left-hand side equals the identity
matrix, the right-hand side is a solution for x .
Given the augmented matrix corresponding to the system of linear equations, our
mission (should we choose to accept) is to get the left-hand side into the form of the
identity matrix, using only these three elementary row operations:
1.
2.
3.
interchanging two rows of the matrix;
add (or subtract) a multiple of one row to another row; and
multiply each element in a row by the same nonzero number.
We perform these operations to every element of the row, both on the left hand
side. With the particular matrix given above, these are what the permissible
elementary row operations look like:
! a21 a22 a23 b2 $
#
&
# a11 a12 a13 b1 &
#" a31 a32 a33 b3 &%
# a11
a12
a13
b1 &
%
(
% a21 ! " a31 a22 ! " a32 a23 ! " a33 b2 ! " b3 (
%$ a31
a32
a33
b3 ('
" a11
a12
a13 b1 %
$
'
a22
a23 b2 '
$ a21
$#! a31 ! a32 ! a33 ! b3 '&
My strategy for solving these is usually first to arrange the equations in a way that
makes sense (with experience, you’ll figure out what’s easiest). Then I divide the first
row through by the constant a11 :
! 1 aa12
11
#
a
a
# 21
22
#a
#" 31 a32
a13
a11
a23
a33
$
&
b2 &
b3 &&
%
b1
a11
Then I subtract a21 times the first row off of the second; a31 times the first row off
from the third:
Fall 2007 math class notes, page 92
a12
"1
a11
$
$ 0 a22 ! a21a a12
11
$
a31 a12
$ 0 a32 ! a
11
#
%
'
a b
b2 ! a2111 1 '
'
a b
b3 ! a3111 1 '
&
a13
a11
b1
a11
a23 !
a21 a13
a11
a33 !
a31 a13
a11
I do a similar thing for the second row now, dividing through by the coefficient on
the term in the second row:
a12
"1
a11
$
$0
1
$
$ 0 a ! a31a12
32
a11
$#
a13
a11
(a
23
!
a21 a13
a11
)( a
a33 !
22
!
) (b
a21 a12 !1
a11
2
b1
a11
!
a21b1
a11
)( a
b3 !
a31 a13
a11
22
!
a21 a12
a11
)
a31b1
a11
%
'
!1
'
'
'
'&
In order to get zeros in the second places of the first and third rows, I multiply the
second row by the appropriate constant and subtract off:
"1 0
$
$
$0 1
$
$ 0 0 a33 !
#
a13
a11
a31 a13
a11
( )( a ! )( a ! )
( a ! )( a ! )
! (a !
)( a ! )( a !
!
a12
a11
23
a21 a13
a11
22
a21 a12 !1
a11
23
a21 a13
a11
22
a21 a12 !1
a11
32
a31 a12
a11
23
a21 a13
a11
22
b1
a11
)
a21 a12 !1
a11
b3 !
a31b1
a11
( )(b ! )( a ! )
(b ! )( a ! )
! (a !
)(b ! )( a !
!
a12
a11
2
2
a21b1
a11
32
a31 a12
a11
a21b1
a11
22
2
22
a21 a12 !1
a11
a21 a12 !1
a11
a21b1
a11
And so on. Though this looks really nasty when presented this way, it turns out
usually to work pretty well. Let’s try an example:
" !1 2 3 !7 %
!7 = !x1 + 2x2 + 3x3
" !7 % " !1 2 3 % " x1 %
$
'
$
'
$
'
$
'
5 = 4x1 + 5x2 + 6x3 ! $ 5 ' = $ 4 5 6 ' $ x2 ' ! $ 4 5 6 5 '
$# 7 8 9 11'&
$# 11'& $# 7 8 9 '& #$ x3 '&
11 = 7x1 + 8x2 + 9x3
The first step is to divide the first row by the coefficient in the top left—in this case,
that turns out to be a negative one. Then we subtract the top row time four from
the second row, and the top row times seven from the bottom row:
" !1 2 3 !7 %
" 1 !2 !3 7 %
" 1 !2 !3 7 %
$
'
$
'
$
'
$ 4 5 6 5 ' ! $ 4 5 6 5 ' ! $ 0 13 18 !23'
$# 7 8 9 11'&
$# 7 8 9 11'&
$# 0 22 30 !38 '&
Then we divide the second row by 13 in order to get a leading 1, and add two times
the second row to the first row, and subtract 22 times the second row from the last:
Fall 2007 math class notes, page 93
22
a21 a12
a11
)
%
'
'
'
!1 '
'
&
"1 0
" 1 !2 !3 7 %
" 1 !2 !3 7 %
$
$
'
$
'
18
!23
$ 0 13 18 !23' ! $ 0 1 13 13 ' ! $ 0 1
$0 0
$ 0 22 30 !38 '
$# 0 22 30 !38 '&
#
&
#
!3 45
13 13
18 !23
13 13
!6 12
13 13
%
'
'
'
&
Finally, we divide the last row by !6
13 , and subtract the appropriate about off from the
first and second rows:
"1 0
$
$0 1
$0 0
#
!3 45
13 13
18 !23
13 13
!6 12
13 13
%
"1 0
'
$
' ! $0 1
'
$0 0
&
#
%
" 1 0 0 3%
'
$
'
' ! $ 0 1 0 1'
$# 0 0 1 !2 '&
1 !2 '&
!3 45
13 13
18 !23
13 13
The right-hand side of the matrix now tells us what the vector x should equal. We
should now go back and verify (by multiplying the original problem) that this works.
Sometimes, you might try to work one of these systems and end up with a very funny
(contradictory) result in the end, or an entire row might turn into zeros (which leaves
you with no chance of turning its diagonal element into a one). Most likely, this is a
sign that you have made an arithmetic error—but if you go back and check your
steps and this is still the outcome, then you have encountered a system without a
solution or with infinitely many solutions. I’ll talk more about these later.
The second way of solving a system of equations is so simple people often overlook
it. Suppose we have the system:
!b1 $ ! a11
#b & = # a
# 2 & # 21
#"b3 &% #" a31
a12
a22
a32
a13 $ ! x1 $
a23 && ## x2 && ' b = A x
a33 &% #" x3 &%
Provided that A is an invertible n ! n matrix, we can solve this by premultiplying
both sides by A !1 :
!
!
x = A !1b
And then performing the appropriate matrix multiplication.
example again:
" !7 % " !1 2 3 % " x1 %
$ 5' = $ 4 5 6' $x '
$ ' $
'$ 2'
$# 11'& $# 7 8 9 '& $# x3 '&
Let’s look at that
'1
! x1 $ ! '1 2 3 $ ! '7 $
#x & = # 4 5 6& # 5&
# 2& #
& # &
#" x3 &% #" 7 8 9 &% #" 11&%
Using the formula for matrix inversion, we find this:
Fall 2007 math class notes, page 94
!
!
x = A !1b =
" A11
1 $
A12
det A $
$# A13
A21
A22
A23
A31 % " !7 %
" !3 6 !3% " !7 % " !2 %
1$
'
$
'
A32 ' $ 5 ' = $ 6 !30 18 '' $$ 5 '' = $$ 1''
6
$# !3 20 13 '& $# 11'& $# 4 '&
A33 &' #$ 11&'
Pretty nifty that we can do it two ways and get the same solution, huh? Of course,
this method works only when the matrix is invertible; later, I’ll show how being
singular corresponds to a system with many or no solutions.
If we look at the matrix inversion method, we observe an interesting pattern arising.
In the three by three case, what we have is that:
1
(b1A11 + b2 A21 + b3 A31 )
det A
1
x2 =
(b1A12 + b2 A22 + b3 A32 )
det A
1
x3 =
(b1A13 + b2 A23 + b3 A33 )
det A
x1 =
What does this look like? Well, these bear a remarkable resemblance to the formula
for determinants.
b1
b1 A11 + b2 A21 + b3 A31 = b2
b3
a12
a22
a32
a13
a23
a33
a11 b1
b1 A11 + b2 A21 + b3 A31 = a21 b2
a31 b3
a13
a23
a33
a11
b1 A11 + b2 A21 + b3 A31 = a21
a31
a12
a22
a32
b1
b2
b3
So in fact all we have to do to solve this system of equations (much easier than
inverting a matrix) is to say that xi equals the determinant
of the matrix formed by
!
b
replacing the i-th column of A with the vector , divided by the determinant of A.
This is known as Cramer’s Rule.
Theorem:
Let A be a nonsingular n ! n matrix. Then the system of equations:
Fall 2007 math class notes, page 95
! b1 $ ! a11
#b & # a
2
21
b=# &=#
# b3 & # "
# & #
"b4 % " an1
a12 ! a1n $ ! x1 $
a22
a2n & # x2 &
& # & = Ax
# " &#" &
&# &
an2 ! ann % " xn %
has the unique solution that:
xi =
det Bi
det A
where Bi is the matrix formed by replacing the i-th column of A with the vector b .
Provided that you can remember this formula, this is usually the most efficient way
to solve a system of equations.
Recall that if we imagine a matrix as a bunch of vectors, the determinant measures
the span of these vectors. This area is largest when the vectors are more at odds with
one another, the closer they are to being orthogonal, the less they have in common.
The first column of A is where x1 does all of its “explaining” of the outcome:
b1 = a11 x1 + a12 x2 + … + a1n xn
b2 = a21 x1 + a22 x2 + … + a2n xn
If x1 is very large (relative to the other variables), then the first column of A should
be very similar in direction to the outcome b , right? Only the magnitudes might
differ. In order to test how large this effect is, we take out this first column and
stick in b instead. If it’s true that x1 has the most effect on the outcome, then this
substitution should not change the shape of the area spanned by the matrix much,
only its size.
Another way of thinking of this is that if variables other than x1 had relatively little
effect on the outcome of b , then b would be fairly orthogonal to the vectors in A
other than x1 . This would mean that the area spanned by b and these other vectors
would be relatively large.
It might be useful to make up some numbers for a two-by-two matrix A, and to
represent its determinant graphically. Then make up a vector for x, and see what the
implied values for b are. Draw the area spanned by B1 and B2 . Does it seem that
the relative size of these areas corresponds to the relative sizes of the two x variables?
Not all systems of equations have a unique solution. Some have infinitely many, and
some have none. Here is one simple example:
Fall 2007 math class notes, page 96
3 = 2x1 + 2x2
6 = 4x1 + 4x2
In some sense, the second equation gives us no more “information” than the first,
since it simple has all the constants doubled. This system can be fulfilled by a lot of
points, all lying along a line. In contrast, the system:
3 = 2x1 + 2x2
6 = 2x1 + 2x2
has no solution. Effectively, we have been given two contradictory pieces of
information: by transitivity, they imply that 3 = 6 , which is absurd. When we have a
system of n equations in n unknowns, the lack of a unique solution happens if and
only if two (or more) equations suggest that the same relationship between variables
produces the same outcome, or that they produce different outcomes.
In short, the lack of a unique solution happens if and only if two equations suggest
the same relationship between variables. Here are some examples of systems of
equations that suggest the same relationship, also represented in matrix form:
! 3 = 2x1 + 2x2
"
#6 = 4x1 + 4x2
%3( %2
$ ' *='
&6 ) & 4
! 3 = 2x1 + 2x2
"
#6 = 2x1 + 2x2
% 3 ( % 2 2 ( % x1 (
$ ' *='
*' *
& 6 ) & 2 2 ) & x2 )
2 ( % x1 (
4 *) '& x2 *)
!1 = 4x1 + 2x2 + 5x3
&1 ) & 4
#
( + (
"2 = 6x1 + 4x2 + 3x3 % ( 2 + = ( 6
# 3 = 5x + 3x + 4x
(' 3 +* (' 5
1
2
3
$
2 5 ) & x1 )
4 3 ++ (( x2 ++
3 4 +* (' x3 +*
In each case, one of the following is two: two rows are the same, one rows is a
multiple of another, or one row is a linear combination of two others. If we look at
the determinants of the matrices on the right hand side, we’ll see something else
these equations have in common (other than the lack of a unique solution): all these
matrices are singular.
So here’s the law for square matrices:
Unique solution ! Full rank ! Linear independence !
Nonsingular ! Invertible
Fall 2007 math class notes, page 97
I think that’s it. If there are any other desirable properties of square matrices,
they’re most likely also equivalent.
The old principle about being able to solve “n equations in n unknowns” works if and
only if these are linearly independent equations. What about when you have k
equations in n unknowns? Well, as you probably knew before, k < n generally means
that there is an infinite number of solutions, whereas k > n generally implies no
solution at all.
Systems of inequalities
Intersection of lines => intersection of halfspaces
Fall 2007 math class notes, page 98