Arnoldi Iterations

During my first year of undergraduate Maths degree at University of Pavia I was seriously evaluating the possibility to change degree course to something more related to Software Engineering. Thankfully I came across the course of Numerical Linear Algebra, I remember that during that course ideas that I hadn’t fully understood until that moment become clearer and I was fascinated in particularly by the combination of scientific computing and pure mathematical analysis of the methods that were treated in the course. My attention was captured by a very interesting algorithm known with the name of Arnoldi iterations.

Introduction


In this review we will focus our attention to the Arnoldi iterations that is a numerical linear algebra method first introduce in 50s by W.E Arnoldi \cite{arnoldi1951}. To understand what this method does and where it could be used we will take a brief detour in some general method of numerical linear algebra. The key definition needed to understand Arnoldi iterations is the following:
Definition – Hessemberg
A matrix H\in \mathcal{C}^{n\times n} is said to be in the Hessenberg form if is has such a shape that: h_{i\,j}=0\;\forall 1\leq j \leq n\; \forall j>i+1, where we called h_{i\,j} the entry of H in the i-th row and j-th column. Therefore H has to be shaped as:

\begin{bmatrix} h_{1\,1}&h_{1\,2}&\dots&h_{1\,n}\\ h_{2\,1}&h_{2\,2}&\ddots&\ddots\\ 0 & h_{3\,2} & \ddots&\ddots\\ \vdots &\ddots & \ddots& \ddots\\ 0 & \dots & 0 & h_{n\, n-1} \end{bmatrix}


We will now try to show why this particular matrix form has a great importance in numerical linear algebra, considering the eigenvalue problem.
Let’s consider a square matrix A\in \mathbb{C}^{n\times n} and let’s search \lambda\in \mathbb{C} such that exist \vec{w}\in \mathbb{C} that verify the equation: \label{eigenvalueproblem} A\vec{w}=\lambda\vec{w}. We know from linear algebra that this problem is equivalent to computing the zeros of the characteristic polynomial of A:

\label{charpolynomial} p(z)=\det(A-zI)


the zeros of p are the eigenvalue of A and we call spectrum of A the set:


\label{spectrum} \sigma(A)=\Big\{z\in \mathbb{C}\;:\; p(z)=0\Big\}


A numerical approach to solve this problem could be the power iteration method, a general overview of this method could be found in Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997. Briefly this method consist in studying the sequences:

\vec{v}_{k+1} = \frac{A\vec{v}_k}{||A\vec{v}_k||} \qquad\qquad \mu_{k}=\frac{\vec{v}_k^*A\vec{v}_k}{\vec{v}_k^*v_k}


Under the assumption that A is hermitian, that the spectrum of A is \sigma(A)={\lambda_1,\dots,\lambda_n} verify the inequality chain \lambda_1>\dots > \lambda_n and that the randomly chosen vector \vec{v}_0 has no zero component in the direction of \vec{w}_1 the eigenvector associated with \lambda_1, is possible to prove that the sequence \mu_k converges to \lambda_1, the actual proof could be found in Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997.
The main cost for the power iteration method is computing \vec{v}_k, in particular the vector matrix product A\vec{v}_k, to be precise to compute \vec{v}_k we need k(2n^2-n) flops. We could easily understand that for each zero entry of the matrix A the computational cost is reduce therefore we aim to represent A in the form with the greatest number of zero entries. If A is hermitian spectral theorem tells us that A could be diagonalized, therefore \exists M\in \mathbb{C}^{n\times n} such that A=MDM^{-1}, with D being a diagonal matrix. Clearly the diagonal form would be the best in terms of zero entries, unfortunately knowing the decomposition is equivalent to know the eigenvalues of A, and that is the problem we are addressing. The next best matrix forms in terms of zero entries are the tridiagonal, the triangular and the Hessenberg form. If we try to express A in the triangular form by orthogonal base changes we would need to perform a Schur decomposition. The problem with this approach is the cost of Schur decomposition which is computed by QR algorithm that has a base cost of \mathcal{O}(n^3), detail about the Schur decomposition could be found in Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997 and Quarteroni, Alfio, Riccardo Sacco, and Fausto Saleri. Numerical mathematics. Vol. 37. Springer Science & Business Media, 2010. We will come back to the QR decomposition cost later on because reducing a matrix in Hessenberg form will be a great advantage when computing QR decomposition, more information regarding the QR algorithm could be found Watkins, David S. “Understanding the QR algorithm.” SIAM review 24.4 (1982): 427-440.
If A is an Hermitian matrix the Lanczos theorem tell us that we could reduce A in the tridiagonal form by orthogonal transformations.

Theorem – Laczos
Let A\in  \mathbb{C}^{n\times n} be an hermitian matrix, it exist an unitary matrix Q\in \mathbb{C}^{n\times n} such that is verified the following equation:


\label{laczos} A=QTQ^*


with T being a tridiagonal matrix.
Proof
We will prove this by explicitly building the matrix T and Q, in this way we the reader could get an idea of how the Lanczos algorithm works. Lets choose a random vector \vec{v}_1 of unitary norm, and we define the following quantities: \alpha_1=(A\vec{v}_1)^*\vec{v}_1 and \beta_1=0. Then for all i\in {2,\dots, n} we compute the sequences:

\alpha_i=(A\vec{v}_i)^*\vec{v_i}\qquad \beta_i=||A\vec{v}_{i-1}-\alpha_{i-1}\vec{v}_{}-\beta{i-1}\vec{v}_{i-2}||

\vec{v}_0=\vec{0}\qquad \vec{v_i}=\frac{1}{\beta_i}(A\vec{v}_{i-1}-\alpha_{i-1}\vec{v}-\beta_{i-1}\vec{v}_{i-2})

in case \beta_i is the null vector we take as \vec{v}_i a random vector of unitary norm. Now we define the following matrix:

Q=\begin{bmatrix} \uparrow & &\uparrow\\  \vec{v}_1&\dots& \vec{v}_n\\ \downarrow & & \downarrow \end{bmatrix}\qquad \qquad T=\begin{bmatrix} \alpha_1 & \beta_2 & &&\\         \beta_2 & \ddots & \ddots&&\\         &\ddots & \ddots & \ddots&\\        & &\beta{n} & \alpha_n & \beta_n \end{bmatrix}


Carrying out matrix multiplication thanks to how we have defined Q and T we obtain equation A=QTQ^*.
Further detail on the Lanczos method could be found in Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997, Quarteroni, Alfio, Riccardo Sacco, and Fausto Saleri. Numerical mathematics. Vol. 37. Springer Science & Business Media, 2010 and Golub, Gene H., and Dianne P. O’Leary. “Some history of the conjugate gradient and Lanczos algorithms: 1948–1976.” SIAM review 31.1 (1989): 50-102.


Unfortunately for non hermitian matrix nothing tell us that we could represent the matrix in tridiagonal form, in fact the best we could hop to achieve is representing the matrix in the Hessenberg form. From what we have said until know we know that if A is hermitian to apply the power iteration we wont to represent it in tridiagonal form, but what happens if A is not hermitian or even normal ?
Well if A isn’t hermitian we could not use the power iteration method, instead we use the QR algorithm. Briefly the QR algorithm consist in studying the sequence:

A_{k+1}=Q_k^*A_kQ_k


with A_k=Q_kR_k being the QR factorization of A_k. Is possible to prove that A_k converges to an upper triangular matrix, therefore we have performed a Schur decomposition and we could find the eigenvalue of A along the diagonal entries of A_k. Further detail on the QR alghorithm could be found in Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997, Quarteroni, Alfio, Riccardo Sacco, and Fausto Saleri. Numerical mathematics. Vol. 37. Springer Science & Business Media, 2010 and Watkins, David S. “Understanding the QR algorithm.” SIAM review 24.4 (1982): 427-440. In reality what matters is that if A is expressed in Hessenberg form the cost of QR algorithm goes from \mathcal{O}(n^3) to \mathcal{O}(n^2).

Algorithm



So far we have understood that that given a matrix A\in \mathbb{C}^{n\times n} if we wont to find \lambda_1 its greatest eigenvalue we wont A to be transformed by orthogonal transformation into a tridiagonal matrix if A is hermitian or to be represented in Hessenberg form else way. To be precise we just wont a method that represent A by orthogonal transformation in Hessenberg form because orthogonal transformation preserve the symmetry and an Hessenberg matrix that is also hermitian can only be tridiagonal. Such a method is exactly Arnoldi iterations.


Theorem – Arnoldi
Let A\in \mathbb{C}^{n\times n}, there is Q\in \mathbb{C}^{n\times n} such that is verified the following equation:

H=Q^*AQ

with H being a matrix expressed in Hessenberg form.

Proof

We will prove as well this result by direct construction of the decomposition shown in the previous equation, even if in very peculiar cases this construction might fail, as we could see in Embree, Mark. “The Arnoldi eigenvalue iteration with exact shifts can fail.” SIAM Journal on Matrix Analysis and Applications 31.1 (2009): 1-10. Such failures are due to computation problem and not to the linear algebra argument standing behind the algorithm. Lets start from a random vector \vec{q}_1 with unitary norm. We will then build the sequence of vector \vec{q}_k for every k\in {1,\dots,n} such that the following equation is verified:

\vec{q}_k = A\vec{q}_{k-1}-\sum_{j=1}^{k-1}(\vec{q}_i^*\vec{q}_k)\vec{q}_i

We will as well define the entries of a matrix H\in \mathbb{C}^{n\times n} as:

h{j\, k}=\vec{q}_j^*\vec{q}_k\qquad\qquad h_{k\,k}=||\vec{q}_k||

Now by matrix multiplication we could show that the matrix H and Q verify H=Q^*AQ, with Q being the matrix defined as:

Q=\begin{bmatrix} \uparrow & &\uparrow\\ \vec{q}_1&\dots& \vec{q}_n\\ \downarrow & & \downarrow \end{bmatrix}

by breaking this down equation we could easily obtain a method to perform Arnoldi iteration.

Algorithm – Arnoldi

We could write such a method in pseudo code as follow:

q1=ones(n,1);         
q1 = q1/norm(q1);         
Q = zeros(n,m); Q(:,1) = q1;         
H = zeros(min(m+1,m),n);         
for m=1:n             
    z = AQ(:,k);
    for i=1:m
        H(i,k) = Q(:,i)'z;                 
        z = z - H(i,k)Q(:,i);
    end
    if k < n                 
       H(k+1,k) = norm(z);
       if H(k+1,k) == 0, return, end
       Q(:,k+1) = z/H(k+1,k);
    end         
end

If we compare the Arnoldi algorithm with the Gram-Schmidt algorithm for QR factorization, the code could be found in Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997 and Quarteroni, Alfio, Riccardo Sacco, and Fausto Saleri. Numerical mathematics. Vol. 37. Springer Science & Business Media, 2010 we immediately understand that the Arnoldi iterations is nothing more then the Gram-Schmidt method applied on the matrix:

\begin{bmatrix}     \uparrow & \uparrow & & \uparrow \\     \vec{q}_1 & A\vec{q}_1 & \dots & A^{n-1}\vec{q}_1\\     \downarrow & \downarrow &  & \downarrow     \end{bmatrix}

this matrix is well known in numerical linear algebra and it goes by the name of Krylov matrix of order n associated with A and q_1, we will often refer to such matrix simply by writing K_n(A,q_1). More information regarding Krylov matrix could be found in Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997, for now we are only concerned with the fact that is the projection of matrix A in the Krylov subspace \mathcal{K}_n(A,b) defined as the subspace spanned by the vectors: \vec{q}_1,A\vec{q}_1,\dots,A^{n-1}\vec{q}_1. What we have just said will lead to the following result:

Proposition

Let A\in \mathbb{C}^{n\times n}, if K_n(A,b) is the Krylov matrix of order n associated with A and b then the following equation is verified:

H=Q^*AQ

with H be the Hessenberg representation produced by Arnoldi iterations, with \vec{q}_1=\vec{b}, and with K_n(A,\vec{b})=QR being the QR decomposition of K_n(A,\vec{b}).

Now a very legitimate question arises, why the Schur decomposition was discarded as a method to produce a zeros filled representation of A being too cost full in favor of the Arnoldi iterations that in the form presented until now still have cost \mathcal{O}(n^3). This question shine a light over the true beauty of Arnoldi iterations, its iterative nature. What we mean when we speak about iterative nature is the fact that if we stop the Arnoldi method before computing the vector \vec{q}_{m+1} we have built the Hessenberg matrix H_m\in \mathbb{C}^{m\times m} and Q_m^{n\times m} such that the following equation is verified:

H_m=Q_m^*AQ_m

In the next section we will discuss the reason why to solve the problem presented in the Introduction, ie computing \lambda_i, could be enough to study the matrix H_m with m<<n. Clearly studying the matrix H_m, with m<<n, imply that working with an approximation of A such that the cost need to represent this approximation in a zeros filled form is \mathcal{O}(m^3), and \mathcal{O}(m^3)<< \mathcal{O}(n^3). To simplify the discussion we will refer as Arnoldi decomposition of order m to:

A=Q_mH_mQ_m^*


where Q_m and H_m are the same as in equation previous equation. Further information regarding Arnoldi iterations could be found in Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997, Watkins, David S. “Some perspectives on the eigenvalue problem.” SIAM review 35.3 (1993): 430-471 and Saad, Yousef. Numerical methods for large eigenvalue problems: revised edition. Vol. 66. Siam, 2011.

Spectral properties of Arnoldi iterations


Lets consider the matrix A\in \mathbb{C}^{n\times n} with spectrum \sigma(A)={\lambda_1,\dots,\lambda_n} that verify the following chain of inequalities: \lambda_1>\lambda_2>\dots>\lambda_n, exactly the same scenario of the previous discussion. To compute the eigenvalue of A we could perform an Arnoldi decomposition of order n and the apply the QR algorithm, the question we are going to ask ourself is what happens if instead of performing an Arnoldi decomposition of order n we perform an Arnoldi decomposition of order m<n and the apply the QR algorithm.
The basic idea that permeate this section is the following: if we consider the Arnoldi decomposition of order m, shown before, we know that it is the projection of A in the Krylov subspace of order m associated with A and \vec{b}=\vec{q}_1, \mathcal{K}_m(A,\vec{b}), therefore we expect H_m to preserve some information regarding the spectral properties of A. Further more considering the structure of \mathcal{K}_m(A,\vec{b}) and the fact that when multiplying by A all the components in the directions of \vec{w}_1 get amplified more then others, since \lambda_1 is the greatest eigenvalue, we expect in particular spectral information regarding \lambda_1 to resinate in H_m more then others. We could verify numerically that such a reasoning is working by computing the eigenvalues of A and H_m and observing that the eigenvalues approximations are not to far off and further more the approximation get closer and closer as m approaches n.


As explained before the spectral information resinate particularly for greater eigenvalues, therefore we would expect for m<<n that the matrix H_m could still provide useful information regarding \lambda_1, as is clearly visible in the Figure before where for m=5 the greatest eigenvalue of H_m is very close to \lambda_1=20. In this section we will present the standard explanation to why such a phenomenon occurs, in particular we present the explanation proposed in Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997, Watkins, David S. “Some perspectives on the eigenvalue problem.” SIAM review 35.3 (1993): 430-471 and Greenbaum, Anne, and Lloyd N. Trefethen. “GMRES/CR and Arnoldi/Lanczos as matrix approximation problems.” SIAM Journal on Scientific Computing 15.2 (1994): 359-368. But before presenting the idea mentioned before we need to characterize further the Arnoldi iterations and we would do so thanks to the next result:


Proposition
Let A\in\mathbb{C}^{n\times n} and b\in \mathbb{C}^n, if K_m(A,b) has full rank then there is one monic polynomial p^m of degree m that minimize ||p(A)b||, with p being a generic monic polynomial of degree m. Further more p^m is unique and is give by:

p^m(x)=\det(H_m-xI)

Proof
Since p^m is monic the following equation holds for some y\in\mathbb{C}^{n}:

p^m(A)b=A^mb-Q_my

where K_m(A,b)=Q_mR_m is the QR factorization of K_m(A,b).


Readers familiar with the least square problem (detail could be found in Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997, Watkins, David S. “Some perspectives on the eigenvalue problem.” SIAM review 35.3 (1993): 430-471 and Quarteroni, Alfio, Riccardo Sacco, and Fausto Saleri. Numerical mathematics. Vol. 37. Springer Science & Business Media, 2010) easily understand that minimizing the norm of previous equation is a least square problem, therefore we know that p^m(A)b is orthogonal to \mathcal{K}_m(A,b). This means Q_m^*p^m(A)b=\vec{0} and that Q_m^*p^m(Q_mH_mQ_m^*)b=Q_m^*Q_mp^m(H_m)Q_m^*b=\vec{0} this imply p^m(H_m)=\vec{0} and so by Caylay-Hamilton theorem we know that p^m is the characteristic polynomial of H_m.


Now lets start by tackling the first bit of the problem proposed before the last proposition, ie why the eigenvalue of H_m get closer and closer to the eigenvalue of A as m approaches n ?
We know that ||p^m(A)b|| must be as small as possible, in particular if p is the characteristic polynomial of A we have ||p(A)b||=0 and that is as small as it goes. Therefore ||p^m(A)b|| get closer and closer to ||p(A)b||, so p^m as well become closer and closer to p. Later on we will try to take a different approach to formally prove that p_m converges to p. The second bit of the problem presented above is the convergence speed, in fact even if we gave a reasonable idea to why p_m converge to p, we haven’t yet proved why even for a small m then p_m is a good approximation, in terms of roots, of p.


In particular we observe numerically that if we call \lambda^{(m)}_1 the greatest eigenvalue of H_m, we have that \lambda^{(m)}_1 converges geometrically to \lambda_1, such a phenomenon can be observed in Figure above. To have an idea regarding why this phenomenon occur we shell take in consideration the matrix:

A=\begin{bmatrix}\lambda & 1 & &\\ & 0 &1&\\& & \ddots & \ddots\\& & & 0 \end{bmatrix}


The characteristic polynomial of A is p(z)=z^{n-1}(z-\lambda). We now consider \overset{\sim}{p_m}(z)=z^{m-1}\Big(z-\overset{\sim}{\lambda}\Big), with \overset{\sim}{\lambda} close to \lambda, as candidate for the characteristic polynomial of A. If we take \lambda being close to 1 we have that for z=\lambda the following request holds:

1\approx |p(\lambda)|=\lambda^m|\overset{\sim}{\lambda}-\lambda|


therefore |\overset{\sim}{\lambda}-\lambda|\approx \Big(\frac{1}{\lambda}\Big)^m, this means that the convergence is geometric. Clearly the idea presented in this section don’t allow to fully understand how the Arnoldi locates the eigenvalue. Furthermore the way the Arnoldi iterations locates the eigenvalues is still an open question.

Leave a Reply

Your email address will not be published. Required fields are marked *