📄 paper.tex
字号:
\def\cakedir{.}\def\figdir{./Fig}\lefthead{Fomel}\righthead{Conjugate directions}\footer{SEP--92}\lstset{language=c,numbers=left,numberstyle=\tiny,showstringspaces=false}\title{Least-square inversion with inexact adjoints. \\Method of conjugate directions: A tutorial}%\keywords{inversion, algorithm, modeling, linear }\email{sergey@sep.stanford.edu}\author{Sergey Fomel}\maketitle\begin{abstract} This tutorial describes the classic method of conjugate directions: the generalization of the conjugate-gradient method in iterative least-square inversion. I derive the algebraic equations of the conjugate-direction method from general optimization principles. The derivation explains the ``magic'' properties of conjugate gradients. It also justifies the use of conjugate directions in cases when these properties are distorted either by computational errors or by inexact adjoint operators. The extra cost comes from storing a larger number of previous search directions in the computer memory. A simple program and two examples illustrate the method.\end{abstract}\section{Introduction}%%%%This paper describes the method of conjugate directions for solvinglinear operator equations in Hilbert space. This method is usuallydescribed in the numerous textbooks on unconstrained optimization asan introduction to the much more popular method of conjugategradients. See, for example, {\em Practical optimization} by\cite{gill} and its bibliography. The famous conjugate-gradient solverpossesses specific properties, well-known from the original works of\cite{hestenes} and \cite{fletcher}. For linear operators and exactcomputations, it guarantees finding the solution after, at most, $n$iterative steps, where $n$ is the number of dimensions in the solutionspace. The method of conjugate gradients doesn't require explicitcomputation of the objective function and explicit inversion of theHessian matrix. This makes it particularly attractive for large-scaleinverse problems, such as those of seismic data processing andinterpretation. However, it does require explicit computation of theadjoint operator. \cite{Claerbout.blackwell.92,iee} shows dozens ofsuccessful examples of the conjugate gradient application withnumerically precise adjoint operators.\parThe motivation for this tutorial is to explore the possibility ofusing different types of preconditioning operators in the place ofadjoints in iterative least-square inversion. For some linear orlinearized operators, implementing the exact adjoint may pose adifficult problem. For others, one may prefer differentpreconditioners because of their smoothness\cite[]{Claerbout.sep.89.201,Crawley.sep.89.207}, simplicity\cite[]{kleinman}, or asymptotic properties \cite[]{herman}. In thosecases, we could apply the natural generalization of the conjugategradient method, which is the method of conjugate directions. The costdifference between those two methods is in the volume of memorystorage. In the days when the conjugate gradient method was invented,this difference looked too large to even consider a practicalapplication of conjugate directions. With the evident increase ofcomputer power over the last 30 years, we can afford to do it now.\parI derive the main equations used in the conjugate-direction methodfrom very general optimization criteria, with minimum restrictionsimplied. The textbook algebra is illustrated with a simple program andtwo simple examples.\section{IN SEARCH OF THE MINIMUM} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%We are looking for the solution of the linear operator equation\begin{equation}{\bf d = A\,m}\;,\label{eqn:equation}\end{equation} where ${\bf m}$ is the unknown model in the linear model space, ${\bfd}$ stands for the given data, and ${\bf A}$ is the forward modelingoperator. The data vector ${\bf d}$ belongs to a Hilbert space witha defined norm and dot product. The solution is constructed by iterativesteps in the model space, starting from an initial guess ${\bfm}_0$. Thus, at the $n$-th iteration, the current model ${\bf m}_n$ isfound by the recursive relation\begin{equation}{\bf m}_n = {\bf m}_{n-1} + \alpha_n {\bf s}_n\;,\label{eqn:mn}\end{equation}where ${\bf s}_n$ denotes the step direction, and $\alpha_n$ standsfor the scaling coefficient. The residual at the $n$-th iteration isdefined by\begin{equation}{\bf r}_n = {\bf d - A\,m}_{n}\;.\label{eqn:residual}\end{equation}Substituting (\ref{eqn:mn}) into (\ref{eqn:residual}) leads to the equation\begin{equation}{\bf r}_n = {\bf r}_{n-1} - \alpha_n {\bf A\,s}_n\;.\label{eqn:rn}\end{equation}For a given step ${\bf s}_n$, we can choose $\alpha_n$ to minimize thesquared norm of the residual\begin{equation}\|{\bf r}_n\|^2 = \|{\bf r}_{n-1}\|^2 - 2\,\alpha_n \left({\bf r}_{n-1},\,{\bf A\,s}_n\right) +\alpha_n^2\,\|{\bf A\,s}_n\|^2\;.\label{eqn:rnorm}\end{equation}The parentheses denote the dot product, and $\|{\bf x}\|=\sqrt{({\bf x,\,x})}$ denotes the norm of $x$ in thecorresponding Hilbert space. The optimal value of $\alpha_n$ is easilyfound from equation (\ref{eqn:rnorm}) to be\begin{equation}\alpha_n = {{\left({\bf r}_{n-1},\,{\bf A\,s}_n\right)} \over{\|{\bf A\,s}_n\|^2}}\;.\label{eqn:alpha}\end{equation}Two important conclusions immediately follow from this fact. First,substituting the value of $\alpha_n$ from formula (\ref{eqn:alpha}) intoequation (\ref{eqn:rn}) and multiplying both sides of this equation by ${\bfr}_n$, we can conclude that\begin{equation}\left({\bf r}_n,\,{\bf A\,s}_n\right) = 0\;,\label{eqn:rasn}\end{equation}which means that the new residual is orthogonal to the correspondingstep in the residual space. This situation is schematically shown inFigure \ref{fig:dirres}. Second, substituting formula (\ref{eqn:alpha}) into (\ref{eqn:rnorm}), we can conclude that the new residual decreases accordingto\begin{equation}\|{\bf r}_n\|^2 = \|{\bf r}_{n-1}\|^2 - {{\left({\bf r}_{n-1},\,{\bf A\,s}_n\right)^2} \over{\|{\bf A\,s}_n\|^2}}\;,\label{eqn:pythagor}\end{equation}(``Pythagoras's theorem'' ), unless ${\bf r}_{n-1}$ and ${\bf A\,s}_n$are orthogonal. These two conclusions are the basic features ofoptimization by the method of steepest descent. They will help usdefine an improved search direction at each iteration.\inputdir{XFig}\sideplot{dirres}{height=2.5in}{Geometry of the residual in thedata space (a scheme).}\section{IN SEARCH OF THE DIRECTION} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Let's sup\-pose we have a ge\-ne\-ra\-tor that pro\-videsparti\-cu\-lar search directions at each step. The new direction canbe the gradient of the objective function (as in the method ofsteepest descent), some other operator applied on the residual fromthe previous step, or, generally speaking, any arbitrary vector in themodel space. Let us denote the automatically generated direction by${\bf c}_n$. According to formula (\ref{eqn:pythagor}), the residualdecreases as a result of choosing this direction by\begin{equation}\|{\bf r}_{n-1}\|^2 - \|{\bf r}_{n}\|^2 = {{\left({\bf r}_{n-1},\,{\bf A\,c}_n\right)^2} \over {\|{\bfA\,c}_n\|^2}}\;.\label{eqn:deltar}\end{equation}How can we improve on this result? \subsection{First step of the improvement}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Assuming $n>1$, we can add someamount of the previous step ${\bf s}_{n-1}$ to the chosen direction${\bf c}_n$ to produce a new search direction ${\bf s}_n^{(n-1)}$, asfollows:\begin{equation}{\bf s}_n^{(n-1)} = {\bf c}_n + \beta_n^{(n-1)}\,{\bf s}_{n-1}\;,\label{eqn:cn}\end{equation}where $\beta_n^{(n-1)}$ is an adjustable scalar coefficient. According toto the fundamental orthogonality principle (\ref{eqn:rasn}), \begin{equation}\left({\bfr}_{n-1},\,{\bf A\,s}_{n-1}\right) = 0\;. \label{eqn:rasn1}\end{equation}As follows from equation (\ref{eqn:rasn1}), the numerator on the right-handside of equation (\ref{eqn:deltar}) is not affected by the new choice of thesearch direction:\begin{equation}\left({\bf r}_{n-1},\,{\bf A\,s}_n^{(n-1)}\right)^2 = \left[\left({\bf r}_{n-1},\,{\bf A\,c}_n\right) + \beta_n^{(n-1)}\,\left({\bf r}_{n-1},\,{\bf A\,s}_{n-1}\right)\right]^2 =\left({\bf r}_{n-1},\,{\bf A\,c}_n\right)^2\;.\label{eqn:numerator}\end{equation}However, we can use transformation (\ref{eqn:cn}) to decrease thedenominator in (\ref{eqn:deltar}), thus further decreasing the residual${\bf r}_n$. We achieve the minimization of the denominator\begin{equation}\|{\bf A\,s}_n^{(n-1)}\|^2 = \|{\bf A\,c}_n\|^2 + 2\,\beta_n^{(n-1)}\,\left({\bf A\,c}_n,\,{\bf A\,s}_{n-1}\right) +\left(\beta_n^{(n-1)}\right)^2\,\|{\bf A\,s}_{n-1}\|^2\label{eqn:denominator}\end{equation} by choosing the coefficient $\beta_n^{(n-1)}$ to be\begin{equation}\beta_n^{(n-1)} = - {{\left({\bf A\,c}_n,\,{\bf A\,s}_{n-1}\right)} \over{\|{\bf A\,s}_{n-1}\|^2}}\;.\label{eqn:beta}\end{equation}Note the analogy between (\ref{eqn:beta}) and (\ref{eqn:alpha}). Analogously to(\ref{eqn:rasn}), equation (\ref{eqn:beta}) is equivalent to the orthogonality condition\begin{equation}\left({\bf A\,s}_n^{(n-1)},\,{\bf A\,s}_{n-1}\right) = 0\;.\label{eqn:acas}\end{equation}Analogously to (\ref{eqn:pythagor}), applying formula (\ref{eqn:beta}) is also equivalent to defining theminimized denominator as\begin{equation}\|{\bf A\,c}_n^{(n-1)}\|^2 = \|{\bf A\,c}_n\|^2 -{{\left({\bf A\,c}_n,\,{\bf A\,s}_{n-1}\right)^2} \over{\|{\bf A\,s}_{n-1}\|^2}}\;.\label{eqn:pithagor2}\end{equation}\subsection{Second step of the improvement}%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Now let us assume $n > 2$ and add some amount of the step fromthe $(n-2)$-th iteration to the search direction, determining the newdirection ${\bf s}_n^{(n-2)}$, as follows:\begin{equation}{\bf s}_n^{(n-2)} = {\bf s}_n^{(n-1)} + \beta_n^{(n-2)}\,{\bf s}_{n-2}\;.\label{eqn:cn2}\end{equation}We can deduce that after the second change, the value of numerator inequation (\ref{eqn:deltar}) is still the same:\begin{equation}\left({\bf r}_{n-1},\,{\bf A\,s}_n^{(n-2)}\right)^2 = \left[\left({\bf r}_{n-1},\,{\bf A\,c}_n\right) + \beta_n^{(n-2)}\,\left({\bf r}_{n-1},\,{\bf A\,s}_{n-2}\right)\right]^2 =\left({\bf r}_{n-1},\,{\bf A\,c}_n\right)^2\;.\label{eqn:numerator2}\end{equation}This remarkable fact occurs as the result of transforming the dot product $\left({\bfr}_{n-1},\,{\bf A\,s}_{n-2}\right)$ with the help of equation(\ref{eqn:rn}):\begin{equation}\left({\bf r}_{n-1},\,{\bf A\,s}_{n-2}\right) =\left({\bf r}_{n-2},\,{\bf A\,s}_{n-2}\right) -\alpha_{n-1}\,\left({\bf A\,s}_{n-1},\,{\bf A\,s}_{n-2}\right) = 0\;.\label{eqn:dotprod}\end{equation}The first term in (\ref{eqn:dotprod}) is equal to zero according to formula(\ref{eqn:rasn}); the second term is equal to zero according to formula(\ref{eqn:acas}). Thus we have proved the new orthogonality equation\begin{equation}\left({\bf r}_{n-1},\,{\bf A\,s}_{n-2}\right) = 0\;,\label{eqn:rasn2}\end{equation}which in turn leads to the numerator invariance (\ref{eqn:numerator2}). Thevalue of the coefficient $\beta_n^{(n-2)}$ in (\ref{eqn:cn2}) is definedanalogously to (\ref{eqn:beta}) as\begin{equation}\beta_n^{(n-2)} = - {{\left({\bf A\,s}_n^{(n-1)},\,{\bf A\,s}_{n-2}\right)} \over{\|{\bf A\,s}_{n-2}\|^2}} = - {{\left({\bf A\,c}_n,\,{\bf A\,s}_{n-2}\right)} \over{\|{\bf A\,s}_{n-2}\|^2}}\;,\label{eqn:beta2}\end{equation}where we have again used equation (\ref{eqn:acas}). If ${\bf A\,s}_{n-2}$ isnot orthogonal to ${\bf A\,c}_n$, the second step of the improvement leadsto a further decrease of the denominator in (\ref{eqn:pythagor}) and,consequently, to a further decrease of the residual.\subsection{Induction}%%%%%%%%%%%%%%%%%%Continuing by induction the process of adding a linear combination ofthe previous steps to the arbitrarily chosen direction ${\bf c}_n$(known in mathematics as the {\em Gram-Schmidt orthogonalizationprocess}), we finally arrive at the complete definition of the newstep ${\bf s}_n$, as follows:\begin{equation}{\bf s}_n = {\bf s}_n^{(1)} = {\bf c}_{n} + \sum_{j=1}^{j=n-1}\,\beta_n^{(j)}\,{\bf s}_{j}\;.\label{eqn:step}\end{equation}Here the coefficients $\beta_n^{(j)}$ are defined by equations\begin{equation}\beta_n^{(j)} = - {{\left({\bf A\,c}_n,\,{\bf A\,s}_{j}\right)} \over{\|{\bf A\,s}_{j}\|^2}}\;,\label{eqn:betaj}\end{equation} which correspond to the orthogonality principles\begin{equation}\left({\bf A\,s}_n,\,{\bf A\,s}_{j}\right) = 0\;,\;\;1 \leq j \leq n-1\label{eqn:asj}\end{equation}and\begin{equation}\left({\bf r}_{n},\,{\bf A\,s}_{j}\right) = 0\;,\;1 \leq j \leq n\;.\label{eqn:rasnj}\end{equation}It is these orthogonality properties that allowed us to optimize thesearch parameters one at a time instead of solving the $n$-dimensionalsystem of optimization equations for $\alpha_n$ and $\beta_n^{(j)}$.\section{ALGORITHM}The results of the preceding sections define the method of conjugatedirections to consist of the following algorithmic steps:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -