📄 paper.tex
字号:
\moddex{solver_reg}{generic solver with regularization}\end{comment}\par% The subroutine for this job does not initialize the original model.% The caller might want to initialize the model with $\bold m=\bold 0$% or with some other model $\bold m_0$.After all the definitions,we load the negative of the data into the residual.If a starting model $\bold m_0$ is present,then we update the data part of the residual$\bold r_d=\bold F \bold m_0 - \bold d$and we loadthe model part of the residual$ \bold r_m = \bold A \bold m_0$.Otherwise we begin from a zero model $\bold m_0=\bold 0$ and thusthe model part of the residual $ \bold r_m$ is also zero.After this initialization, subroutine \texttt{solver\_reg()} % \vpageref{lst:solver_reg}begins an iteration loop by first computingthe proposed model perturbation $\Delta \bold m$(called \texttt{g} in the program)with the adjoint operator:\begin{equation} \Delta \bold m \quad\longleftarrow\quad \left[ \begin{array}{cc} \bold F' & \bold A' \end{array} \right] \ \left[ \begin{array}{c} \bold r_d \\ \bold r_m \end{array} \right]\end{equation}\begin{comment}I chose to implement the model roughening operator $\bold A$with the convolution subroutine \texttt{tcai1()} \vpageref{lst:tcai1},which has transient end effects(and an output length equal to the input length plus the filter length).The adjoint of subroutine {\tt tcai1()} suggests perturbationsin the convolution input (not the filter).\end{comment}Using this value of $\Delta \bold m$,% {\tt invint1()} we canfind the implied change in residual $\Delta \bold r$ as\begin{equation} \Delta \left[ \begin{array}{c} \bold r_d \\ \bold r_m \end{array} \right]\quad\longleftarrow\quad \left[ \begin{array}{c} \bold F \\ \bold A \end{array} \right] \ \Delta \bold m\end{equation}and the last thing in the loop is to usethe optimization step function \texttt{stepper()} % conjugate-direction subroutine \texttt{cgplus()} \vpageref{lst:cgplus}to choose the length of the step sizeand to choose how much of the previous step to include.\parAn example of using the new solver is subroutine \texttt{invint1}.%\vpageref{lst:invint1}.I chose to implement the model roughening operator $\bold A$with the convolution subroutine \texttt{tcai1()} \vpageref{lst:tcai1},which has transient end effects(and an output length equal to the input length plus the filter length).The adjoint of subroutine {\tt tcai1()} suggests perturbationsin the convolution input (not the filter).\moddex{invint1}{invers linear interp.}{24}{40}{user/gee}%\end{notforlecture}\parFigure \ref{fig:im1-2+1} shows an example for a $(1,-2,1)$ filter with $\epsilon = 1$.The continuous curve representing the model $\bold m$passes through the data points.Because the models are computed with transient convolution end-effects,the models tend to damp linearly to zero outside the region wheresignal samples are given.\sideplot{im1-2+1}{width=3in,height=1.5in}{ Sample points and estimation of a continuous function through them.}\parTo show an example where the result is clearly a theoretical answer,I prepared another figure with the simpler filter $(1,-1)$.When we minimize energy in the first derivative of the waveform,the residual distributes itself uniformlybetween data pointsso the solution there is a straight line.Theoretically it should be a straight linebecause a straight line has a vanishing second derivative,and that condition arises by differentiating by $\bold x'$,the minimized quadratic form$\bold x' \bold A' \bold A \bold x$, and getting$ \bold A' \bold A \bold x=\bold 0$.(By this logic, the curves between data points in Figure \ref{fig:im1-2+1}must be cubics.)The $(1,-1)$ result is shown in Figure \ref{fig:im1-1a}.\sideplot{im1-1a}{width=3in,height=1.5in}{ The same data samples and a function through them that minimizes the energy in the first derivative.}\parThe example ofFigure \ref{fig:im1-1a}has been a useful test case for me.You'll see it again in later chapters.What I would like to show you here is a movie showing the convergenceto Figure \ref{fig:im1-1a}.Convergence occurs rapidly where data points are close together.The large gaps, however, fill at a rate of one point per iteration.\subsection{Abandoned theory for matching wells and seismograms}\parLet us consider theory toconstruct a map $\bold m$ that fits dense seismic data$\bold s$ and the well data $\bold w$.The first goal$\bold 0 \approx \bold L \bold m - \bold w$says that when we linearly interpolate from the map,we should get the well data.The second goal$\bold 0 \approx \bold A (\bold m - \bold s)$(where $\bold A$ is a roughening operator like $\nabla$ or $\nabla^2$)says that the map $\bold m$ should match the seismic data $\bold s$at high frequencies but need not do so at low frequencies.\begin{equation} \begin{array}{lll} \bold 0 &\approx & \bold L \bold m - \bold w \\ \bold 0 &\approx & \bold A (\bold m - \bold s) \label{eqn:toldisregression} \end{array}\end{equation}\parAlthough (\ref{eqn:toldisregression}) is the way I originally formulatedthe well-fitting problem, I abandoned it for several reasons:First, the map had ample pixel resolution compared to other sources of error,so I switched from linear interpolation to binning.Once I was using binning,I had available the simpler empty-bin approaches.These have the further advantage that it is not necessaryto experiment with the relative weighting betweenthe two goals in (\ref{eqn:toldisregression}).A formulation like (\ref{eqn:toldisregression}) is more likelyto be helpful where we need to handle rapidly changing functionswhere binning is inferior to linear interpolation,perhaps in reflection seismology where high resolution is meaningful.\begin{exer}\item It is desired to find a compromise between the Laplacian roughener and the gradient roughener. What is the size of the residual space? \sx{roughener ! Laplacian} \sx{roughener ! gradient}\item Like the seismic prospecting industry, you have solved a huge problem using binning. You have computer power left over to do a few iterations with linear interpolation. How much does the cost per iteration increase? Should you refine your model mesh, or can you use the same model mesh that you used when binning?\end{exer}%\end{notforlecture}\section{PREJUDICE, BULLHEADEDNESS, AND CROSS VALIDATION}First we first look at data $\bold d$.Then we think about a model $\bold m$,and an operator $\bold L$ to link the model and the data.Sometimes the operator is merely the first term in a series expansionabout $(\bold m_0,\bold d_0)$.Then we fit$\bold d-\bold d_0 \approx \bold L ( \bold m-\bold m_0)$.To fit the model, we must reduce the fitting residuals.Realizing that the importance of a data residualis not always simply the size of the residualbut is generally a function of it,we conjure up (topic for later chapters)a weighting function (which could be a filter) operator $\bold W$.This defines our data residual:\begin{equation}\bold r_d \eq \bold W[ \bold L ( \bold m-\bold m_0)\ -\ ( \bold d-\bold d_0)]\end{equation}\parNext we realize that the data might not be adequate to determine the model,perhaps because our comfortable dense sampling of the modelill fits our economical sparse sampling of data.Thus we adopt a fitting goal that mathematicians call ``regularization''and we might call a ``model style'' goalor more simply,a quantification of our prejudice about models.We express this by choosing an operator $\bold A$,often simply a roughener like a gradient(the choice again a topic in this and later chapters).It defines our model residual by$\bold A \bold m$ or$\bold A ( \bold m-\bold m_0)$, say we choose\begin{equation}\bold r_m \eq \bold A \bold m \end{equation}\parIn an ideal world, our model prejudice would not conflictwith measured data, however,life is not so simple.Since conflicts between data and preconceived notions invariably arise(and they are why we go to the expense of acquiring data)we need an adjustable parameterthat measures our ``bullheadedness'', how much we intendto stick to our preconceived notions in spite of contradicting data.This parameter is generally called epsilon $\epsilon$because we like to imagine that our bullheadedness is small.(In mathematics, $\epsilon$ is often taken to bean infinitesimally small quantity.)Although any bullheadedness seems like a bad thing,it must be admitted that measurements are imperfect too.Thus as a practical matter we often find ourselves minimizing\begin{equation}\min \quad := \quad\bold r_d \cdot \bold r_d \ +\ \epsilon^2\ \bold r_m \cdot \bold r_m \end{equation}and wondering what to choose for $\epsilon$.I have two suggestions:My simplest suggestion is to choose $\epsilon$so that the residual of data fitting matches that of model styling.Thus\begin{equation}\epsilon \eq \sqrt{\bold r_d \cdot \bold r_d\over\bold r_m \cdot \bold r_m }\end{equation}My second suggestion is to think of the force on our final solution.In physics, force is associated with a gradient.We have a gradient for the data fittingand another for the model styling:\begin{eqnarray}\bold g_d &=& \bold L' \bold W' \bold r_d \\\bold g_m &=& \bold A' \bold r_m\end{eqnarray}We could balance these forces by the choice\begin{equation}\epsilon \eq \sqrt{\bold g_d \cdot \bold g_d\over\bold g_m \cdot \bold g_m }\end{equation}Although we often ignore $\epsilon$ in discussing the formulationof a problem, when time comes to solve the problem, reality intercedes.Generally, $\bold r_d$ has different physical units than $\bold r_m$(likewise $\bold g_d$ and $\bold g_m$)and we cannot allow our solutionto depend on the accidental choice of unitsin which we express the problem.I have had much experience choosing $\epsilon$, but it isonly recently that I boiled it down to the above two suggestions.Normally I also try other values, like double or half thoseof the above choices,and I examine the solutions for subjective appearance.If you find any insightful examples, please tell me about them.\parComputationally, we could choose a new $\epsilon$ with each iteration,but it is more expeditiousto freeze $\epsilon$, solve the problem,recompute $\epsilon$, and solve the problem again.I have never seen a case where more than one iteration was necessary.\parPeople who work with small problems(less than about $10^3$ vector components)have access to an attractive theoretical approachcalled cross-validation.Simply speaking,we could solve the problem many times,each time omitting a different data value.Each solution would provide a modelthat could be used to predictthe omitted data value.The quality of these predictionsis a function of $\epsilon$and this provides a guide to finding it.My objections to cross validation are two-fold:First, I don't know how to apply it in the large problemslike we solve in this book(I should think more about it);and second,people who worry much about $\epsilon$,perhaps first should think more carefully abouttheir choice of the filters $\bold W$ and $\bold A$,which is the focus of this book.Notice that both $\bold W$ and $\bold A$can be defined with a scaling factor which is like scaling $\epsilon$.Often more important in practice,with $\bold W$ and $\bold A$we have a scaling factor that need not be constant butcan be a function of space or spatial frequencywithin the data space and/or model space.\begin{comment}\section{References}\reference{Gill, P.E., Murray, W., and Wright, M.H., 1981, Practical optimization: Academic Press. }\reference{Hestenes, M.R., and Stiefel, E., 1952, Methods of conjugate gradients for solving linear systems: J. Res. Natl. Bur. Stand., {\bf 49}, 409-436. }\reference{Luenberger, D.G., 1973, Introduction to linear and nonlinear programming: Addison-Wesley. }\reference{Nolet, G., 1985, Solving or resolving inadequate and noisy tomographic systems: J. Comp. Phys., {\bf 61}, 463-482. }\reference{Paige, C.C., and Saunders, M.A., 1982a, LSQR: an algorithm for sparse linear equations and sparse least squares: Assn. Comp. Mach. Trans. Mathematical Software, {\bf 8,} 43-71. }\reference{Paige, C.C., and Saunders, M.A., 1982b, Algorithm 583, LSQR: sparse linear equations and least squares problems: Assn. Comp. Mach. Trans. Mathematical Software, {\bf 8,} 195-209. }%\reference{Press, W.H. et al., 1989,% Numerical recipes: the art of scientific computing:% Cambridge University Press.% }\reference{Strang, G., 1986, Introduction to applied mathematics: Wellesley-Cambridge Press. }\end{comment}\clearpage
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -