We begin with a proof by induction (Proof Technique
I) of the first statement in the conclusion of the theorem. We use induction on the dimension of
\(V\) to show that if
\(\ltdefn{T}{V}{V}\) is a linear transformation, then there is a basis
\(B\) for
\(V\) such that the matrix representation of
\(T\) relative to
\(B\text{,}\) \(\matrixrep{T}{B}{B}\text{,}\) is an upper triangular matrix.
To start suppose that
\(\dimension{V}=1\text{.}\) Choose any nonzero vector
\(\vect{v}\in V\) and realize that
\(V=\spn{\set{\vect{v}}}\text{.}\) Then
\(\lteval{T}{\vect{v}}=\beta\vect{v}\) for some
\(\beta\in\complexes\text{,}\) which determines
\(T\) uniquely (
Theorem LTDB). This description of
\(T\) also gives us a matrix representation relative to the basis
\(B=\set{\vect{v}}\) as the
\(1\times 1\) matrix with lone entry equal to
\(\beta\text{.}\) And this matrix representation is upper triangular (
Definition UTM).
For the induction step let
\(\dimension{V}=m\text{,}\) and assume the theorem is true for every linear transformation defined on a vector space of dimension less than
\(m\text{.}\) By
Theorem EMHE (suitably converted to the setting of a linear transformation),
\(T\) has at least one eigenvalue, and we denote this eigenvalue as
\(\lambda\text{.}\) (We will remark later about how critical this step is.) We now consider properties of the linear transformation
\(\ltdefn{T-\lambda I_V}{V}{V}\text{.}\)
Let \(\vect{x}\) be an eigenvector of \(T\) for \(\lambda\text{.}\) By definition \(\vect{x}\neq\zerovector\text{.}\) Then
\begin{align*}
\lteval{\left(T-\lambda I_V\right)}{\vect{x}}
&=\lteval{T}{\vect{x}}-\lambda\lteval{I_V}{\vect{x}}&&
\knowl{./knowl/xref/theorem-VSLT.html}{\text{Theorem VSLT}}\\
&=\lteval{T}{\vect{x}}-\lambda\vect{x}&&
\knowl{./knowl/xref/definition-IDLT.html}{\text{Definition IDLT}}\\
&=\lambda\vect{x}-\lambda\vect{x}&&
\knowl{./knowl/xref/definition-EELT.html}{\text{Definition EELT}}\\
&=\zerovector&&
\knowl{./knowl/xref/property-AI.html}{\text{Property AI}}\text{.}
\end{align*}
So
\(T-\lambda I_V\) is not injective, as it has a nontrivial kernel (
Theorem KILT). With an application of
Theorem RPNDD we bound the rank of
\(T-\lambda I_V\text{,}\)
\begin{align*}
\rank{T-\lambda I_V}&=\dimension{V}-\nullity{T-\lambda I_V}\leq m-1\text{.}
\end{align*}
Let \(W\) be the subspace of \(V\) that is the range of \(T-\lambda I_V\text{,}\) \(W=\rng{T-\lambda I_V}\text{,}\) and define \(k=\dimension{W}\leq m-1\text{.}\) We define a new linear transformation \(S\text{,}\) on \(W\text{,}\)
\begin{gather*}
\ltdefn{S}{W}{W},\quad\lteval{S}{\vect{w}}=\lteval{T}{\vect{w}}\text{.}
\end{gather*}
This does not look we have accomplished much, since the action of \(S\) is identical to the action of \(T\text{.}\) For our purposes this will be a good thing. What is different is the domain and codomain. \(S\) is defined on \(W\text{,}\) a vector space with dimension less than \(m\text{,}\) and so is susceptible to our induction hypothesis. Verifying that \(S\) is really a linear transformation is almost entirely routine, with one exception. Employing \(T\) in our definition of \(S\) raises the possibility that the outputs of \(S\) will not be contained within \(W\) (but instead will lie inside \(V\text{,}\) but outside \(W\)). To examine this possibility, suppose that \(\vect{w}\in W\text{.}\) We have
\begin{align*}
\lteval{S}{\vect{w}}
&=\lteval{T}{\vect{w}}\\
&=\lteval{T}{\vect{w}}+\zerovector&&
\knowl{./knowl/xref/property-Z.html}{\text{Property Z}}\\
&=\lteval{T}{\vect{w}}+\left(\lambda\lteval{I_V}{\vect{w}}-\lambda\lteval{I_V}{\vect{w}}\right)&&
\knowl{./knowl/xref/property-AI.html}{\text{Property AI}}\\
&=\left(\lteval{T}{\vect{w}}-\lambda\lteval{I_V}{\vect{w}}\right)+\lambda\lteval{I_V}{\vect{w}}&&
\knowl{./knowl/xref/property-AA.html}{\text{Property AA}}\\
&=\left(\lteval{T}{\vect{w}}-\lambda\lteval{I_V}{\vect{w}}\right)+\lambda\vect{w}&&
\knowl{./knowl/xref/definition-IDLT.html}{\text{Definition IDLT}}\\
&=\lteval{\left(T-\lambda I_V\right)}{\vect{w}}+\lambda\vect{w}&&
\knowl{./knowl/xref/theorem-VSLT.html}{\text{Theorem VSLT}}\text{.}
\end{align*}
Since
\(W\) is the range of
\(T-\lambda I_V\text{,}\) \(\lteval{\left(T-\lambda I_V\right)}{\vect{w}}\in W\text{.}\) And by
Property SC,
\(\lambda\vect{w}\in W\text{.}\) Finally, applying
Property AC we see by closure that the sum is in
\(W\) and so we conclude that
\(\lteval{S}{\vect{w}}\in W\text{.}\) This argument convinces us that it is legitimate to define
\(S\) as we did with
\(W\) as the codomain.
\(S\) is a linear transformation defined on a vector space with dimension \(k\text{,}\) less than \(m\text{,}\) so we can apply the induction hypothesis and conclude that \(W\) has a basis, \(C=\set{\vectorlist{w}{k}}\text{,}\) such that the matrix representation of \(S\) relative to \(C\) is an upper triangular matrix.
Beginning with the linearly independent set
\(C\text{,}\) repeatedly apply
Theorem ELIS to add vectors to
\(C\text{,}\) maintaining a linearly independent set and spanning ever larger subspaces of
\(V\text{.}\) This process will end with the addition of
\(m-k\) vectors, which together with
\(C\) will span all of
\(V\text{.}\) Denote these vectors as
\(D=\set{\vectorlist{u}{{m-k}}}\text{.}\) Then
\(B=C\cup D\) is a basis for
\(V\text{,}\) and is the basis we desire for the conclusion of the theorem. So we now consider the matrix representation of
\(T\) relative to
\(B\text{.}\)
Since the definition of \(T\) and \(S\) agree on \(W\text{,}\) the first \(k\) columns of \(\matrixrep{T}{B}{B}\) will have the upper triangular matrix representation of \(S\) in the first \(k\) rows. The remaining \(m-k\) rows of these first \(k\) columns will be all zeros since the outputs of \(T\) for basis vectors from \(C\) are all contained in \(W\) and hence are linear combinations of the basis vectors in \(C\text{.}\) The situation for \(T\) on the basis vectors in \(D\) is not quite as pretty, but it is close.
For \(1\leq i\leq m-k\text{,}\) consider
\begin{align*}
\vectrep{B}{\lteval{T}{\vect{u}_i}}
&=\vectrep{B}{\lteval{T}{\vect{u}_i}+\zerovector}&&
\knowl{./knowl/xref/property-Z.html}{\text{Property Z}}\\
&=\vectrep{B}{\lteval{T}{\vect{u}_i}+\left(\lambda\lteval{I_V}{\vect{u}_i}-\lambda\lteval{I_V}{\vect{u}_i}\right)}&&
\knowl{./knowl/xref/property-AI.html}{\text{Property AI}}\\
&=\vectrep{B}{\left(\lteval{T}{\vect{u}_i}-\lambda\lteval{I_V}{\vect{u}_i}\right)+\lambda\lteval{I_V}{\vect{u}_i}}&&
\knowl{./knowl/xref/property-AA.html}{\text{Property AA}}\\
&=\vectrep{B}{\left(\lteval{T}{\vect{u}_i}-\lambda\lteval{I_V}{\vect{u}_i}\right)+\lambda\vect{u}_i}&&
\knowl{./knowl/xref/definition-IDLT.html}{\text{Definition IDLT}}\\
&=\vectrep{B}{\lteval{\left(T-\lambda I_V\right)}{\vect{u}_i}+\lambda\vect{u}_i}&&
\knowl{./knowl/xref/theorem-VSLT.html}{\text{Theorem VSLT}}\\
&=\vectrep{B}{\lincombo{a}{w}{k}+\lambda\vect{u}_i}&&
\knowl{./knowl/xref/definition-RLT.html}{\text{Definition RLT}}\\
&=\colvector{a_1\\a_2\\\vdots\\a_k\\0\\\vdots\\0\\\lambda\\0\\\vdots\\0}&&
\knowl{./knowl/xref/definition-VR.html}{\text{Definition VR}}\text{.}
\end{align*}
In the penultimate equality, we have rewritten an element of the range of \(T-\lambda I_V\) as a linear combination of the basis vectors, \(C\text{,}\) for the range of \(T-\lambda I_V\text{,}\) \(W\text{,}\) using the scalars \(\scalarlist{a}{k}\text{.}\) If we incorporate these \(m-k\) column vectors into the matrix representation \(\matrixrep{T}{B}{B}\) we find \(m-k\) occurrences of \(\lambda\) on the diagonal, and any nonzero entries lying only in the first \(k\) rows. Together with the \(k\times k\) upper triangular representation in the upper left-hand corner, the entire matrix representation for \(T\) is clearly upper triangular. This completes the induction step. So for any linear transformation there is a basis that creates an upper triangular matrix representation.
We have one more statement in the conclusion of the theorem to verify. The eigenvalues of
\(T\text{,}\) and their multiplicities, can be computed with the techniques of
Chapter E relative to any matrix representation (
Theorem EER). We take this approach with our upper triangular matrix representation
\(\matrixrep{T}{B}{B}\text{.}\) Let
\(d_i\) be the diagonal entry of
\(\matrixrep{T}{B}{B}\) in row
\(i\) and column
\(i\text{.}\) Then the characteristic polynomial, computed as a determinant (
Definition CP) with repeated expansions about the first column, is
\begin{align*}
\charpoly{\matrixrep{T}{B}{B}}{x}&=
\left(d_1-x\right)
\left(d_2-x\right)
\left(d_3-x\right)
\cdots
\left(d_m-x\right)\text{.}
\end{align*}
The roots of the polynomial equation
\(\charpoly{\matrixrep{T}{B}{B}}{x}=0\) are the eigenvalues of the linear transformation (
Theorem EMRCP). So each diagonal entry is an eigenvalue, and is repeated on the diagonal exactly
\(\algmult{T}{\lambda}\) times (
Definition AME).