📄 paper.tex

📁 国外免费地震资料处理软件包
💻 TEX
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
\def\sx#1{}\def\bx#1{#1}\def\eq{\quad =\quad}\title{Preconditioning}\author{Jon Claerbout}\maketitle\label{paper:prc}\sx{precondition}%When problems have less than about $10^3$ unknowns,%``exact'' methods are generally satisfactory.%When there are more unknowns,%we use iterative methods.%Iterative methods are about as fast as exact methods,%but they allow us to hope that when we give up exhausted%that we are near the best that can be done.%With more experience,%I learned that we often remain far from the solution we seek.%Thus we seek generally applicable problem reformulations%that will converge more rapidly.%Luckily we find that the helix gives a huge speedup in many problems,%in those problems that are densely sampled in model space.When I first realized that practical imaging methods in widespreadindustrial use amounted merely to the adjoint of forward modeling,I (and others) thought an easy way to achieve fame and fortunewould be to introduce the first steps towards inversionalong the lines of Chapter \ref{lsq/paper:lsq}.Although inversion generally requires a prohibitive numberof steps, I felt that moving in the gradient direction,the direction of steepest descent, would move us rapidlyin the direction of practical improvements.This turned out to be optimistic.It was too slow.But then I learned about the conjugate gradient method thatspectacularly overcomes a well-known speed problem with themethod of steepest descents.I came to realize that it was still too slow.I learned this by watching the convergence in Figure\ref{fig:conv}.This led me to the helix method in Chapter \ref{hlx/paper:hlx}.Here we'll see how it speeds many applications.\parWe'll also come to understand why the gradient is such a poor directionboth for steepest descent and for conjugate gradients.An indication of our path is found in the contrast betweenand exact solution$\bold m = (\bold A'\bold A)^{-1}\bold A'\bold d$ and thegradient$\Delta \bold m = \bold A'\bold d$(which is the first step starting from $\bold m =\bold 0$).Notice that $\Delta \bold m$ differs from $\bold m$by the factor $(\bold A'\bold A)^{-1}$.This factor is sometimes called a spectrumand in some situations it literally is a frequency spectrum.In these cases, $\Delta \bold m$ simply gets a differentspectrum from $\bold m$ and many iterations are required to fix it.Here we'll find that for many problems,``preconditioning'' with the helix is a better way.\section{PRECONDITIONED DATA FITTING}\parIterative methods (like conjugate-directions) can sometimes be acceleratedby a \bx{change of variables}.The simplest change of variable is called a ``trial solution''.Formally, we write the solution as\begin{equation}\bold m \eq \bold S \bold p\end{equation}where $\bold m$ is the map we seek,columns of the matrix $\bold S$ are ``shapes'' that we like,and coefficients in $\bold p$ are unknown coefficientsto select amounts of the favored shapes.The variables $\bold p$  are often called the ``preconditioned variables''.It is not necessary that $\bold S$ be an invertible matrix,but we'll see later that invertibility is helpful.Take this trial solution and insert it intoa typical fitting goal\begin{equation}\bold 0 \quad\approx\quad  \bold F \bold m \ -\  \bold d\end{equation}and get\begin{equation}\bold 0 \quad\approx\quad  \bold F \bold S \bold p \ -\  \bold d\end{equation}We pass the operator $\bold F \bold S$ to our iterative solver.After finding the best fitting                                      $\bold p$,we merely evaluate                                                $ \bold m = \bold S \bold p$to get the solution to the original problem.\parWe hope this change of variables has saved effort.For each iteration, there is a little more work:Instead of the iterative application of                                                $\bold F$ and $\bold F'$we have iterative application of                                        $\bold F\bold S$ and $\bold S'\bold F'$.Our hope is that the number of iterations decreases because we are clever,or because we have been lucky in our choice of $\bold S$.Hopefully,the extra work of the preconditioner operator $\bold S$is not large compared to $\bold F$.If we should be so lucky that$\bold S= \bold F^{-1}$,then we get the solution immediately.Obviously we would try any guess with$\bold S\approx \bold F^{-1}$.Where I have known such $\bold S$ matrices,I have often found that convergence is accelerated,but not by much.Sometimes it is worth using $\bold F\bold S$ for a while in the beginning,but later it is cheaper and faster to use only $\bold F$.A practitioner might regard the guess of $\bold S$as prior information,like the guess of the initial model $\bold m_0$.\parFor a square matrix $\bold S$,the use of a preconditioner should not change the ultimate solution.Taking $\bold S$ to be a tall rectangular matrix,reduces the number of adjustable parameters,changes the solution,gets it quicker, but lower resolution.\subsection{Preconditioner with a starting guess}In many applications, for many reasons,we have a starting guess $\bold m_0$ of the solution.You might worry thatyou could not find the starting preconditioned variable $\bold p_0= \bold S^{-1}\bold m_0$because you did not know the inverse of $\bold S$.The way to avoid this problem is toreformulate the problemin terms of a new variable $\tilde {\bold m}$where$ \bold m = \tilde {\bold m} + \bold m_0$.Then $\bold 0\approx \bold F \bold m - \bold d$becomes$\bold 0\approx \bold F \tilde {\bold m} - (\bold d - \bold F \bold m_0)$or$\bold 0\approx \bold F \tilde {\bold m} - \tilde {\bold d}.$Thus we have accomplished the goal of takinga problem with a nonzero starting modeland converting it a problem of the same typewith a zero starting model.Thus we do not need the inverse of $\bold S$because the iteration starts from $\tilde {\bold m}=\bold 0$so $\bold p_0 = \bold 0$.\section{PRECONDITIONING THE REGULARIZATION}%\subsection{Recasting a fitting problem in white variables}\parThe basic formulation of a geophysical estimation problemconsists of setting up{\em  two}goals,one for data fitting,and the other for model shaping.With two goals, preconditioning is somewhat different.The two goals may be written as:\begin{eqnarray}\bold 0 &\approx& \bold F \bold m - \bold d \\\bold 0 &\approx& \bold A \bold m\end{eqnarray}which defines two residuals,a so-called ``data residual'' and a ``model residual'' thatare usually minimized by conjugate-gradient, least-squares methods.\parTo fix ideas, let us examine a toy example.The data and the first three rows of the matrix beloware random numbers truncated to integers.The model roughening operator $\bold A$is a first differencing operator times 100.%\newslide{\samepage\par\noindent\footnotesize\begin{verbatim}d(m)     F(m,n)                                            iter  Norm---     ------------------------------------------------   ---- ----------- 41.    -55. -90. -24. -13. -73.  61. -27. -19.  23. -55.     1 20.00396538 33.      8. -86.  72.  87. -41.  -3. -29.  29. -66.  50.     2 12.14780140-58.     84. -49.  80.  44. -52. -51.   8.  86.  77.  50.     3  8.94393635  0.    100.   0.   0.   0.   0.   0.   0.   0.   0.   0.     4  6.04517126  0.   -100. 100.   0.   0.   0.   0.   0.   0.   0.   0.     5  2.64737511  0.      0.-100. 100.   0.   0.   0.   0.   0.   0.   0.     6  0.79238468  0.      0.   0.-100. 100.   0.   0.   0.   0.   0.   0.     7  0.46083349  0.      0.   0.   0.-100. 100.   0.   0.   0.   0.   0.     8  0.08301232  0.      0.   0.   0.   0.-100. 100.   0.   0.   0.   0.     9  0.00542009  0.      0.   0.   0.   0.   0.-100. 100.   0.   0.   0.    10  0.00000565  0.      0.   0.   0.   0.   0.   0.-100. 100.   0.   0.    11  0.00000026  0.      0.   0.   0.   0.   0.   0.   0.-100. 100.   0.    12  0.00000012  0.      0.   0.   0.   0.   0.   0.   0.   0.-100. 100.    13  0.00000000\end{verbatim}}\normalsize\parNotice at the tenth iteration,the residual suddenly plunges 4 significant digits.Since there are ten unknowns and the matrix is obviously full-rank,conjugate-gradient theory tells us to expectthe exact solution at the tenth iteration.This is the first miracle of conjugate gradients.(The residual actually does not drop to zero.What is printed in the \texttt{Norm} columnis the square root of the sum of the squaresof the residual components at the \texttt{iter}-thiteration minus that at the last interation.)\subsection{The second miracle of conjugate gradients}The second miracle of conjugate gradients is exhibited below.The data and data fitting matrix are the same,but the model damping is simplified.%\newslide\par\noindent\footnotesize\begin{verbatim}d(m)    F(m,n)                                            iter  Norm---    ------------------------------------------------   ----  ---------- 41.   -55. -90. -24. -13. -73.  61. -27. -19.  23. -55.     1  3.64410686 33.     8. -86.  72.  87. -41.  -3. -29.  29. -66.  50.     2  0.31269890-58.    84. -49.  80.  44. -52. -51.   8.  86.  77.  50.     3 -0.00000021  0.   100.   0.   0.   0.   0.   0.   0.   0.   0.   0.     4 -0.00000066  0.     0. 100.   0.   0.   0.   0.   0.   0.   0.   0.     5 -0.00000080  0.     0.   0. 100.   0.   0.   0.   0.   0.   0.   0.     6 -0.00000065  0.     0.   0.   0. 100.   0.   0.   0.   0.   0.   0.     7 -0.00000088  0.     0.   0.   0.   0. 100.   0.   0.   0.   0.   0.     8 -0.00000074  0.     0.   0.   0.   0.   0. 100.   0.   0.   0.   0.     9 -0.00000035  0.     0.   0.   0.   0.   0.   0. 100.   0.   0.   0.    10 -0.00000037  0.     0.   0.   0.   0.   0.   0.   0. 100.   0.   0.    11 -0.00000018  0.     0.   0.   0.   0.   0.   0.   0.   0. 100.   0.    12  0.00000000  0.     0.   0.   0.   0.   0.   0.   0.   0.   0. 100.    13  0.00000000\end{verbatim}\normalsize\par\noindentEven though the matrix is full-rank,we see the residual drop about 6 decimal places after the third iteration!This convergence behavior is well knownin the computational mathematics literature.Despite its practical importance,it doesn't seem to have a name or identified discoverer.So I call it the ``second miracle.''\parPractitioners usually don't likethe identity operator for model-shaping.Generally they prefer to penalize wiggliness.For practitioners,the lesson of the second miracle of conjugate gradientsis that we have a choice of many iterations,or learning to transformindependent variables so thatthe regularization operator becomes an identity matrix.Basically, such a transformation reduces the iteration countfrom    something about the size of the model spaceto      something about the size of the data space.Such a transformation is called preconditioning.In practice, data is often accumulated in bins.Then the iteration count is reduced (in principle)to the count of full binsand should be independent of the count of the empty bins.This allows refining the bins, enhancing the resolution.\parMore generally,the model goal $\bold 0 \approx \bold A \bold m$introduces a roughening operator like a gradient,Laplacian(and in chapter \ref{mda/paper:mda}a Prediction-Error Filter (PEF)).Thus the model goal is usually a filter,unlike the data-fitting goalwhich involves all manner of geometry and physics.When the model goal is a filter its inverse is also a filter.Of course this includes multidimensional filters with a helix.\parThe preconditioning transformation$\bold m = \bold S \bold p$gives us\begin{equation}        \begin{array}{lll}        \bold 0 &\approx & \bold F \bold S \bold p - \bold d \\        \bold 0 &\approx & \bold A \bold S \bold p        \label{eqn:invintprecond}        \end{array}\end{equation}The operator $\bold A$ is a roughener while $\bold S$ is a smoother.The choices of both $\bold A$ and $\bold S$ are somewhat subjective.This suggests that we eliminate $\bold A$ altogetherby {\em  defining} it to be proportional to the inverse of $\bold S$,thus $\bold A\bold S=\bold I$.The fitting goals become\begin{equation}        \label{eqn:whitevar1}        \begin{array}{lll}        \bold 0 &\approx & \bold F  \bold S \bold p - \bold d \\        \bold 0 &\approx & \epsilon\ \bold p        \end{array}\end{equation}which enables us to benefit from the ``second miracle''.After finding $\bold p$,we obtain the final model with $\bold m = \bold S \bold p$.%Because model shaping is generally a filtering operation,%and because preconditioning operators are best when%they are invertible,%\par%As before (without regularization) we need a pair of%roughening and smoothing operators that are mutually inverse%$ \bold A\bold S=\bold I$.%We need the roughener $\bold A$%to get started with%$\bold p_0=\bold A\bold m_0$,%and we need the smoother $\bold S$ at each iteration.%Besides Fourier transforms,%the only known multidimensional roughening-smoothing operator pairs%are the family of filters on a helix.%Thus    deconvolution on a helix is an all-purpose%        preconditioning strategy for multidimensional model regularization.%\begin{notforlecture}\subsection{Importance of scaling}Another simple toy example shows us the importance of scaling.We use the same example as aboveexcept that the $i$-th column is multiplied by $i/10$which means the $i$-th model variable has been divided by $i/10$.
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -