📄 paper.tex
字号:
%\newslide\par\noindent\footnotesize\begin{verbatim}d(m) F(m,n) iter Norm--- ------------------------------------------------ ---- ----------- 41. -6. -18. -7. -5. -36. 37. -19. -15. 21. -55. 1 11.59544849 33. 1. -17. 22. 35. -20. -2. -20. 23. -59. 50. 2 6.97337770-58. 8. -10. 24. 18. -26. -31. 6. 69. 69. 50. 3 5.64414406 0. 10. 0. 0. 0. 0. 0. 0. 0. 0. 0. 4 4.32118177 0. 0. 20. 0. 0. 0. 0. 0. 0. 0. 0. 5 2.64755201 0. 0. 0. 30. 0. 0. 0. 0. 0. 0. 0. 6 2.01631355 0. 0. 0. 0. 40. 0. 0. 0. 0. 0. 0. 7 1.23219979 0. 0. 0. 0. 0. 50. 0. 0. 0. 0. 0. 8 0.36649203 0. 0. 0. 0. 0. 0. 60. 0. 0. 0. 0. 9 0.28528941 0. 0. 0. 0. 0. 0. 0. 70. 0. 0. 0. 10 0.06712411 0. 0. 0. 0. 0. 0. 0. 0. 80. 0. 0. 11 0.00374284 0. 0. 0. 0. 0. 0. 0. 0. 0. 90. 0. 12 -0.00000040 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 100. 13 0.00000000\end{verbatim}\normalsizeWe observe that solving the same problem for the scaled variableshas required a severe increasein the number of iterations required to get the solution.We lost the benefit of the second CG miracle.Even the rapid convergence predicted for the 10-th iterationis delayed until the 12-th.%\end{notforlecture}\subsection{Statistical interpretation}This book is not a statistics book.Never-the-less, many of you have some statistical knowledgethat allows you a statistical interpretationof these views of preconditioning.\parA statistical concept is that we can combine many streamsof random numbers into a composite model.Each stream of random numbers is generally takento be uncorrelated with the others,to have zero mean, and to have the same varianceas all the others.This is often abbreviated as IID, denotingIndependent, Identically Distributed.Linear combinationslike filtering and weighting operationsof these IID random streamscan build correlated random functions much like thoseobserved in geophysics.A geophysical practitioner seeks to do the inverse,to operate on the correlated unequal random variablesand create the statistical ideal random streams.The identity matrix required for the ``second miracle'',and our search for a good preconditioning transformationare related ideas.The relationship will become more clear in chapter \ref{mda/paper:mda}when we learn how to estimate the best roughening operator $\bold A$as a prediction-error filter.\par\boxit{ Two philosophies to find a preconditioner: \begin{enumerate} \item Dream up a smoothing operator $\bold S$. \item Estimate a prediction-error filter $\bold A$, and then use its inverse $\bold S = \bold A^{-1}$. \end{enumerate} }\par\boxit{ Deconvolution on a helix is an all-purpose preconditioning strategy for multidimensional model regularization. }\parThe outstanding acceleration of convergence by preconditioningsuggests that the philosophy of image creation by optimizationhas a dual orthonormality:First, Gauss (and common sense) tells us that the data residualsshould be roughly equal in size. Likewise in Fourier spacethey should be roughly equal in size, which means they shouldbe roughly white, i.e. orthonormal.(I use the word ``orthonormal''because white means the autocorrelation is an impulse,which means the signal is statistically orthogonalto shifted versions of itself.)Second,to speed convergence of iterative methods,we need a whiteness, another orthonormality, in the solution.The map image, the physical function that we seek, mightnot be itself white, so we should solve first for another variable,the whitened map image, and as a final step,transform it to the ``natural colored'' map.\subsection{The preconditioned solver}Summing up the ideas above,we start from fitting goals\begin{equation}\begin{array}{lll}\label{eq:main}\bold 0 &\approx& \bold F \bold m \ -\ \bold d \\\bold 0 &\approx& \bold A \bold m\end{array}\end{equation}and we change variables from$\bold m$ to $\bold p$ using$\bold m = \bold A^{-1} \bold p$\begin{equation}\begin{array}{llllcl}\bold 0 &\approx & \bold F \bold m \ -\ \bold d &=& \bold F \bold A^{-1} &\bold p \ -\ \bold d\\\bold 0 &\approx & \bold A \bold m &=& \bold I & \bold p\end{array}\label{eqn:precsummary}\end{equation}Preconditioning means iteratively fittingby adjusting the $\bold p$ variablesand then finding the model by using$\bold m = \bold A^{-1} \bold p$.\begin{comment}A new reusablepreconditioned solver isthe module \texttt{solver_prc} \vpageref{lst:solver_prc}.%The variable \texttt{x} in \texttt{prec\_solver} refers to $\bold m$.Likewise the modeling operator $\bold F$ is called \texttt{Fop}and the smoothing operator $\bold A^{-1}$ is called \texttt{Sop}.Details of the code are only slightly different fromthe regularized solver\texttt{solver_reg} \vpageref{lst:solver_reg}.%REMEMBER TO PUT THE PROGRAM BACK IN HERE !%\begin{notforlecture}\moddex{solver_prc}{Preconditioned solver}%\end{notforlecture}\end{comment}\section{OPPORTUNITIES FOR SMART DIRECTIONS}Recall the fitting goals (\ref{eqn:precsummary})\begin{equation}\begin{array}{llllllcl}\bold 0 &\approx& \bold r_d &=& \bold F \bold m \ -\ \bold d &=& \bold F \bold A^{-1} &\bold p \ -\ \bold d \\\bold 0 &\approx& \bold r_m &=& \bold A \bold m &=& \bold I & \bold p\end{array}\label{eqn:precsummary}\end{equation}Without preconditioning we have the search direction\begin{equation}\Delta \bold m_{\rm bad} \eq\left[ \begin{array}{cc} \bold F' & \bold A' \end{array}\right]\left[ \begin{array}{c} \bold r_d \\ \bold r_m \end{array}\right]\end{equation}and with preconditioning we have the search direction\begin{equation}\Delta \bold p_{\rm good} \eq\left[ \begin{array}{cc} (\bold F\bold A^{-1})' & \bold I \end{array}\right]\left[ \begin{array}{c} \bold r_d \\ \bold r_m \end{array}\right]\end{equation}\parThe essential feature of preconditioning is not that we performthe iterative optimization in terms of the variable $\bold p$.The essential feature is that we use a search directionthat is a gradient with respect to $\bold p'$ not $\bold m'$.Using $\bold A\bold m=\bold p$ we have$\bold A\Delta \bold m=\Delta \bold p$.This enables us to define a good search direction in model space.\begin{equation}\Delta \bold m_{\rm good} \eq \bold A^{-1}\Delta \bold p_{\rm good} \eq \bold A^{-1} (\bold A^{-1})' \bold F' \bold r_d + \bold A^{-1} \bold r_m\end{equation}Define the gradient by $\bold g=\bold F'\bold r_d$ andnotice that $\bold r_m=\bold p$.\begin{equation}\Delta \bold m_{\rm good} \eq \bold A^{-1} (\bold A^{-1})' \ \bold g% { \bold g \over \bold A \bold A'} + \bold m\label{eqn:newdirection}\end{equation}\parThe search direction (\ref{eqn:newdirection}) shows a positive-definite operator scaling the gradient.Each component of any gradient vector is independent of each other.All independently point a direction for descent.Obviously, each can be scaled by any positive number.Now we have found that we can also scale a gradient vector bya positive definite matrix and we can still expectthe conjugate-direction algorithm to descend, as always,to the ``exact'' answer in a finite number of steps.This is because modifying the search direction with$ \bold A^{-1} (\bold A^{-1})'$ is equivalent to solvinga conjugate-gradient problem in $\bold p$.%\subsubsection{Implications for better preconditioning}%\par%The search direction (\ref{eqn:newdirection})%brings us both good news and bad news.%First the bad news.%Equation (\ref{eqn:newdirection}) looks like something new%but it will probably turn out to be equivalent to all the%preconditioning work we have done with the operator $\bold A^{-1}$%being polynomial division.%Actually, we do polynomial division%followed by truncation of the infinite series.%Because of the truncation we really do not properly construct%$ \bold A^{-1} (\bold A^{-1})'$, and for this we can expect%some difficulty at one boundary.%\par%The good news is that we now see new preconditioning opportunities.%Formerly we began from a preconditioning %operator $\bold A^{-1}$ but now we see that we really only%need its spectrum $\bold A^{-1} (\bold A^{-1})'$.%That might not be any real advantage since our basic representation%of a positive definite matrix is any matrix times its transpose.%Anyway, we can conclude that the operator $\bold A$%may be chosen as any cascade of weighting and filtering operators.%%\par%We are also reminded that what we really need%is not simply $\bold A^{-1} (\bold A^{-1})'$%but $(\bold F'\bold F+\bold A'\bold A)^{-1}$.%Likewise we imagine this being built of%any cascade of weighting and filtering operators%followed by the adjoint.\section{NULL SPACE AND INTERVAL VELOCITY}A bread-and-butter problem in seismology is building the velocityas a function of depth (or vertical travel time)starting from certain measurements.The measurements are described elsewhere (BEI for example).They amount to measuring the integral of the velocity squaredfrom the surface down to the reflector.It is known as the RMS (root-mean-square) velocity.Although good quality echos may arrive often,they rarely arrive continuously for all depths.Good information is interspersed unpredictably with poor information.Luckily we canalso estimatethe data quality by the ``coherency'' or the``stack energy''.In summary, what we get from observations and preprocessingare two functions of travel-time depth,(1) the integrated (from the surface) squared velocity, and(2) a measure of the quality of the integrated velocity measurement.Some definitions:\begin{description}\item [$\bold d$]is a data vector whose components range over the verticaltraveltime depth $\tau$,and whose component values contain the scaled RMS velocity squared$\tau v_{\rm RMS}^2/\Delta \tau $where$\tau /\Delta \tau $ is the index on the time axis.\item [$\bold W$]is a diagonal matrix along which we lay the given measureof data quality. We will use it as a weighting function.\item [$\bold C$]is the matrix of causal integration, a lower triangular matrix of ones.\item [$\bold D$]is the matrix of causal differentiation, namely, $\bold D=\bold C^{-1}$.\item [$\bold u$]is a vector whose components range over the verticaltraveltime depth $\tau$,and whose component values contain the interval velocity squared$v_{\rm interval}^2 $.\end{description}From these definitions,under the assumption of a stratified earth with horizontal reflectors(and no multiple reflections)the theoretical (squared) interval velocitiesenable us to define the theoretical (squared) RMS velocities by\begin{equation}\bold C\bold u \eq \bold d \end{equation}With imperfect data, our data fitting goal is to minimize the residual\begin{equation}\bold 0\quad\approx\quad\bold W\left[\bold C\bold u-\bold d\right]\end{equation}\parTo find the interval velocitywhere there is no data (where the stack power theoretically vanishes)we have the ``model damping'' goal to minimizethe wiggliness $\bold p$of the squared interval velocity $\bold u$.\begin{equation}\bold 0\quad\approx\quad\bold D \bold u \eq \bold p\end{equation}\parWe precondition these two goalsby changing the optimization variable frominterval velocity squared$\bold u$ to its wiggliness $\bold p$.Substituting $\bold u=\bold C\bold p$ gives the two goalsexpressed as a function of wiggliness $\bold p$.\begin{eqnarray}\label{eqn:model}\bold 0&\approx&\bold W\left[\bold C^2\bold p-\bold d\right]\\\bold 0&\approx&\epsilon \; \bold p\end{eqnarray}\subsection{Balancing good data with bad}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -