proof of Cayley-Hamilton theorem by formal

proof of Cayley-Hamilton theorem by
formal substitutions∗
asteroid†
2013-03-21 19:34:38
properties
Let A be a n × n matrix with entries in a commutative ring with identity,
and let p(λ) = c0 + c1 λ + · · · + cn λn be its characteristic polynomial. We will
prove that p(A) := c0 I + c1 A + · · · + cn An = 0.
Proof (Popular fake proof ):
In the expression
p(t) = det(A − tI) = c0 + c1 t + · · · + cn tn ,
substitute t = A; then p(A) = det(A − AI) = det(0) = 0. It is clear why the argument is faulty. But interestingly, there is a way
to rescue it using clever formal substitution arguments. For the moment, we
assume that the matrix A is over the complex field.
Since the notation p(λ) and p(A) can be confusing at first sight, as one
expression takes scalar values and the other matrix values, we will change it.
From now on, we will use the notation pe(t) when applying the polymial p to a
matrix t. In this sense p(·) is a function of C and pe(·) is function of matrices.
Also, by definition,
pe(t) := c0 I + c1 t + · · · + cn tn ,
where t is a matrix
Of course, we intend to prove that pe(A) = 0.
Proof (Proof of the complex case):
For each λ ∈ C let B(λ) be the classical adjoint to A − λI. We have
B(λ)(A − λI) = det(A − λI)I = p(λ)I = c0 I + λ(c1 I) + · · · + λn (cn I) .
(1)
From the definition of the classical adjoint, it is clear that B(λ) can be written
as a polynomial of λ having degree ≤ n − 1, whose coefficients are matrices.
∗ hProofOfCayleyHamiltonTheoremByFormalSubstitutionsi
created: h2013-03-21i by:
hasteroidi version: h37308i Privacy setting: h1i hProofi h15A18i h15A15i
† This text is available under the Creative Commons Attribution/Share-Alike License 3.0.
You can reuse this document or portions thereof only if you do so under terms that are
compatible with the CC-BY-SA license.
1
That is,
B(λ) = D0 + D1 λ + · · · + Dn−1 λn−1 ,
for some constant coefficient matrices Di .
Now, we would like B to be defined for matrices (just like from the polynomial p we have considered pe), so we define for every matrix t
e := D0 + D1 t + · · · + Dn−1 tn−1 .
B(t)
Now consider the following function of matrices:
Q(t) := (D0 A − c0 I) + (D1 A − D0 − c1 I)t + · · · + (Dn−1 A − Dn−2 − cn−1 I)tn−1 + (−Dn−1 − cn I)tn
(2)
The above expression may look strange, but if we think that the matrix t
commutes with all D0 , . . . , Dn−1 and A, then expression (2) is easily seen to be
equal to
(D0 + D1 t + · · · + Dn−1 tn−1 )(A − It) − c0 I − c1 t − · · · − cn tn
This means that
e
Q(t) = B(t)(A
− It) − pe(t)
whenever tcommutes withD0 , . . . , Dn−1 , A (3)
The reason for not defining Q(t) by the expression in (3) is that we want Q(t)
to be some kind of ”polynomial” in t (with matrix coefficients on the left of each
tk ).
We now state some properties that can be easily checked by straightforward
calculation:
• pe(λI) = p(λ)I
e
• B(λI)
= B(λ)
Notice that matrices of the form λI with λ ∈ C commute with every other
matrix, so that
e
Q(λI) = B(λI)(A
− λI) − pe(λI) = B(λ)(A − λI) − p(λ)I = 0
Now Q(λI) is also a matrix whose entries qij (λ) are polynomials in λ. Since
Q(λI) = 0 we must have qij (λ) = 0 for all λ ∈ C. This means that qij (t) is
the zero polynomial and, since this occurs for all i, j, it follows that the matrix
coefficients of tk occurring in (2) are all zero, i.e. Q(t) is the zero matrix for all
matrices t.
Taking t = A we can also see that
Q(A)
=
(D0 A − c0 I) + (D1 A − D0 − c1 I)A + · · · + (Dn−1 A − Dn−2 − cn−1 I)An−1 + (−Dn−1 − cn I)An
= −c0 I − c1 A − · · · − cn−1 An−1 − cn An
= −e
p(A)
2
Hence pe(A) = 0, which finishes the proof. ,
Actually, C could have been substituted with R or Q in the proof above.
The only property of C that was used is that it is an infinite integral domain.
Proof (Proof for an arbitrary commutative ring with identity):
Let A = (aij ), where the entries aij are in commutative ring with identity
R. First notice that, since c0 + c1 λ + · · · + cn λn = p(λ) = det(A − λI), where
λ ∈ R, we have that the coefficients c0 , . . . , cn are polynomials in {aij }.
Hence pe(A) := c0 I + c1 A + · · · + cn An is a matrix whose entries are also
polynomials in {aij }. These polynomials vanish for every assignment of {aij }
to numbers in C, because the complex case of the theorem has already been
proven (this would be the same as substituting the matrix A by a matrix with
complex entries). Therefore these polynomials are zero polynomials and we
conclude that pe(A) = 0 as we inteded to prove.
Comments on other proofs
Yet another proof of the Cayley-Hamilton Theorem is to establish it for diagonalizable matrices, and then by a density argument (i.e. every matrix can be
approximated by diagonalizable ones in an algebraically closed field), we conclude that p(A) = 0 is an identity for all matrices over a field. This kind of
proof is also presented as an exercise in [?].
The “standard approach” can be found in [?]. It proves the result for matrices over any field, but requires no “abstract algebra” (no algebraic closure,
Zariski topology or formal substitutions).
The two other proofs just mentioned can be extended to matrices over an
arbitrary commutative ring simply by repeating the last argument in our proof.
In [?] there is a proof similar to the one presented here (although it has some
errors).
References
[1] Michael Artin. Algebra. Prentice-Hall, 1991.
[2] Martin Braun. Differential equations and their applications: an
introduction to applied mathematics, 3rd edition. Springer-Verlag,
1983.
[3] Friedberg, Insel, Spence. Linear Algebra, 3rd edition. Prentice-Hall,
1997.
3