⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 paper.tex

📁 国外免费地震资料处理软件包
💻 TEX
📖 第 1 页 / 共 5 页
字号:
%\begin{array}{c}%\bold p_0 \\%\bold 0%\end{array}%\right]%\end{equation}%and%\begin{equation}%\hat{\bold r}_0  \eq%      \bold F\bold S \bold p_0 -\bold d \eq%      \bold F        \bold m_0 -\bold d%\end{equation}%As the iteration begins we have gradients%of the two parts of the model%\begin{eqnarray}%\bold g_m &=& \bold S' \bold F' \hat{\bold r} \\%\bold g_d &=& \epsilon \        \hat{\bold r}%\end{eqnarray}%which imply a perturbation in the theoretically zero residual%\begin{equation}%\Delta\hat{\bold r}  \eq%      \bold F\bold S \bold g_m + \epsilon\; \bold g_d%\end{equation}%Then step sizes and steps are determined%as usual for conjugate-direction fitting.%\par%The preconditioning module \texttt{vr\_solver}%%\vpageref{lst:vrsolver}%takes three functions as its arguments.%Functions \texttt{oper} and \texttt{prec} correspond to the linear%operators $\bold F$ and $\bold S$.%Function \texttt{solv} implements one step of an optimization descent.%Its arguments are a logical parameter \texttt{forget},%which controls a conditional restarting of the optimization,%the current effective model \texttt{x0},%the gradient vector \texttt{g}, the data residual vector \texttt{rr}, and%the conjugate gradient vector \texttt{gg}.%Subroutine \texttt{solver\_vr} constructs%the effective model vector \texttt{x},%which consists of the model-space part \texttt{xm}%and the data-space part \texttt{xd}.%Similarly, the effective gradient vector \texttt{g}%is split into the the model-space part \texttt{gm}%and the data-space part \texttt{gd}.%\par%For brevity I am omitting the code%for the virtual-residual solver \texttt{vrsolver}%which is in the library.%It follows the same pattern as%\texttt{prec\_solver} \vpageref{lst:precsolver}.%%\moddex{vrsolver}{Virtual-residual solver}\section{SCALING THE ADJOINT}First I remind you of a rarely used little bit of mathematical notation.Given a vector $\bold m$ with components $(m_1,m_2,m_3)$,the notation $ {\bf diag\ }\bold m$ means\begin{equation} {\bf diag\ }\bold m \eq	\left[ 	\begin{array}{ccc}		m_1 &  0 & 0 		\\		0   &m_2 & 0		\\		0   &  0 & m_3	\end{array}	\right]\end{equation}\parGiven the usual linearized fitting goal betweendata space and model space, $ \bold d \approx \bold F \bold m$,the simplest image of the model space results fromapplication of the adjoint operator $ \hat{\bold m} = \bold F' \bold d$.Unless $\bold F$ has no physical units, however,the physical units of $\hat{\bold m}$ do not match those of $\bold m$,so we need a scaling factor.The theoretical solution$\bold m_{\rm theor} = (\bold F'\bold F)^{-1}\bold F'\bold d$tells us that the scaling units should be those of $(\bold F'\bold F)^{-1}$.We are going to approximate $(\bold F'\bold F)^{-1}$ by a diagonal matrix$\bold W^2$ with the correct units so$\hat{\bold m} = \bold W^2 \bold F'\bold d$.\parWhat we use for $\bold W$ will be a guess, simply a guess.If it works better than nothing, we'll be happy,and if it doesn't we'll forget about it.Experience shows it is a good idea to try.Common sense tells us to insist that all elements of $\bold W^2$ are positive.$\bold W^2$ is a square matrix of size of model space.From any vector $\tilde {\bold m}$ in model space with all positive components,we could guess that $\bold W^2$ be${\bf diag\ } \tilde {\bold m}$ to any power.To get the right physical dimensions we choose$\tilde {\bold m}=\bold 1$, a vector of all ones and choose\begin{equation}\bold W^2 \eq	{		1			\over		{\bf diag\ } {\bold F'\bold F\bold 1}	}\label{eqn:adjointwt}\end{equation}A problem with the choice(\ref{eqn:adjointwt}) is that some components might be zero or negative.Well, we can take the square root of the squares of componentsand/or smooth the result.\parTo go beyond the scaled adjoint we can use $\bold W$ as a \bx{precondition}er.To use $\bold W$ as a preconditionerwe define implicitly a new set of variables $\bold p$by the substitution $\bold m=\bold W\bold p$.Then $\bold d \approx \bold F\bold m=\bold F\bold W\bold p$.To find $\bold p$ instead of $\bold m$,we iteratewith the operator $\bold  F\bold W$ instead of with $\bold F$.As usual, the first step of the iteration is to use the adjointof $\bold d\approx \bold F\bold W\bold p$ to form the image$\hat{\bold p}=(\bold F\bold W)'\bold d$.At the end of the iterations,we convert from  $\bold p$ back to  $\bold m$with $\bold m=\bold W\bold p$.The result after the first iteration$\hat{\bold m}=\bold W\hat{\bold p}=\bold W(\bold F\bold W)'\bold d=\bold W^2\bold F'\bold d$turns out to be the same as scaling.\parBy (\ref{eqn:adjointwt}), $\bold W$ has physical units inverse to $\bold F$.Thus the transformation $\bold F\bold W$ has no unitsso the $\bold p$ variables have physical units of data space.Experimentalists might enjoy seeing the solution $\bold p$with its data units more than viewing the solution $\bold m$with its more theoretical model units.\parThe theoretical solution for underdetermined systems         $\bold m =\bold F' (\bold F \bold F')^{-1} \bold d$suggestsan alternate approach using instead         $\hat{\bold m} =\bold F' \bold W_d^2 \bold d$.This diagonal weighting matrix $\bold W_d^2$ must be drawnfrom vectors in data space.Again I chose a vector of all 1's getting the weight\begin{equation}\bold W_d^2 \eq	{		1			\over		{\bf diag\ } {\bold F\bold F'\bold 1}	}\label{eqn:datawtadjoint}\end{equation}\parMy choice of a vector of 1's is quite arbitrary.I might as well have chosen a vector of random numbers.Bill Symes, who suggested this approach to me,suggests using an observed data vector $\bold d$ for the data space weight,and $\bold F' \bold d$ for the model space weight.This requires an additional step, dividing out the units of the data $\bold d$.\parExperience tells me that a broader methodology than all above is needed.Appropriate scaling is required in both data space and model space.We need two other weights$\bold W_m$ and$\bold W_d$ where$\hat{\bold m} = \bold W_m \bold F'\bold W_d \bold d$.%\par%I have a useful practical example (stacking in $v(z)$ media)%in another of my electronic books (BEI),%where I found both%$\bold W_m$ and%$\bold W_d$ by iterative guessing.%First assume $\bold W_d =\bold I$ and estimate $\bold W_m$ as above.%Then assume you have the correct $\bold W_m$ and estimate $\bold W_d$ as above.%Iterate.%Perhaps some theorist can find a noniterative solution.\parI have a useful practical example (stacking in $v(z)$ media)in another of my electronic books (BEI),where I found both$\bold W_m$ and$\bold W_d$ by iterative guessing.First assume $\bold W_d =\bold I$ and estimate $\bold W_m$ as above.Then assume you have the correct $\bold W_m$ and estimate $\bold W_d$ as above.Iterate.(Perhaps some theorist can find a noniterative solution.)I believe this iterative procedure leads us to the best diagonalpre- and post- multipliers for any operator $\bold F$.  By this I mean that the modified operator $(\bold W_d \bold F \bold W_m)$is as close to being unitary as we will be able to obtainwith diagonal transformation.Unitary means it is energy conserving and that the inverseis simply the conjugate transpose.\parWhat good is it that$(\bold W_d \bold F \bold W_m)'(\bold W_d \bold F \bold W_m) \approx \bold I$?It gives us the most rapid convergence of least squares problems of the form\begin{equation}\label{eqn:genform}\bold 0 \quad\approx\quad \bold W_d ( \bold F\bold m - \bold d)\eq               \bold W_d ( \bold F\bold W_m \bold p - \bold d)\end{equation}Thus it defines for us the bestdiagonal transform to apreconditioning variable$\bold p=\bold W_m^{-1}\bold m$ to use during iteration,and suggests to us what residual weighting function we need to use ifrapid convergence is a high priority.Suppose we are not satisfied with $\bold W_d$ being the weighting functionfor residuals.  Equation~(\ref{eqn:genform}) could still help us speed iteration.Instead of beginning iteration with $\bold p=\bold 0$,we could begin from the solution $\bold p$to the regression~(\ref{eqn:genform}).\parThe PhD thesis of James Rickett experiments extensivelywith data space and model space weighting functionsin the context of seismic velocity estimation.\section{A FORMAL DEFINITION FOR ADJOINTS}In mathematics, adjoints are defined a little differently thanwe have defined them here(as matrix transposes).\footnote{	I would like to thank Albert Tarantola for suggesting this topic.	}The mathematician begins by telling us that we cannot simplyform any dot product we want.We are not allowed to take the dot product of any two vectorsin model space$\bold m_1 \cdot \bold m_2$ or data space$\bold d_1 \cdot \bold d_2$.Instead, we must first transform them to a preferred coordinate system.Say$\tilde{\bold m}_1 = \bold M \bold m_1$ and$\tilde{\bold d}_1 = \bold D \bold d_1$, etc for other vectors.We complain we do not know $\bold M$ and $\bold D$.They reply that we do not really need to know thembut we do need to have the inverses (aack!) of$\bold M'\bold M$ and$\bold D'\bold D$.A pre-existing common notation is$\bold\sigma_m^{-2} = \bold M'\bold M$ and$\bold\sigma_d^{-2} = \bold D'\bold D$.Now the mathematician buries the mysterious new positive-definitematrix inverses in the definition of dot product$<\bold m_1,\bold m_2> = \bold m'_1 \bold M'\bold M \bold m_2 = \bold m'_1 \bold \sigma_m^{-2} \bold m_2$and likewise with$<\bold d_1,\bold d_2>$.This suggests a total reorganization of our programs.Instead of computing$(\bold m'_1 \bold M') (\bold M \bold m_2)$we could compute $\bold m'_1 (\bold\sigma_m^{-2} \bold m_2)$.Indeed, this is the ``conventional'' approach.This definition of dot product would be buried in the solver code.The other thing that would change would be the search direction$\Delta \bold m$.Instead of being the gradient as we have defined it$\Delta \bold m=\bold L'\bold r$,it would be$\Delta \bold m=\bold\sigma_m^{-2} \bold L'\bold\sigma_d^{-2}\bold r$.A mathematician would{\em define} the adjoint of $\bold L$ to be$\bold\sigma_m^{-2} \bold L'\bold\sigma_d^{-2}$.(Here $\bold L'$ remains matrix transpose.)You might notice this approach nicely incorporatesboth residual weighting and preconditioningwhile yet evading the issue of where we get the matrices$\sigma_m^2$ and $\sigma_d^2$ or how we invert them.Fortunately, upcoming chapter \ref{mda/paper:mda}suggests how,in image estimation problems,to obtain sensible estimates of the elusive operators $\bold M$ and $\bold D$.Paranthetically, modeling calculations in physics and engineeringoften use similar mathematicsin which the role of $\bold M'\bold M$ is not so mysterious.Kinetic energy is mass times velocity squared.Mass can play the role of $\bold M'\bold M$.\parSo, should we continue to use$(\bold m'_1 \bold M') (\bold M \bold m_2)$or should we take the conventional route and go with$\bold m'_1 (\bold\sigma_m^{-2} \bold m_2)$?One day while benchmarking a wide variety of computers I was shockedto see some widely differing numerical results.  Now I know why.Consider adding $10^7$ identical positive floating point numbers, say 1.0's,in an arithmetic with precision of $10^{-6}$.After you have added in the first $10^6$ numbers,the rest will all truncate in the roundoffand your sum will be wrong by a factor of ten.If the numbers were added in pairs,and then the pairs added, etc, there would be no difficulty.Precision is scary stuff!\parIt is my understanding and belief that there is nothing wrongwith the approach of this book, in fact,it seems to have some definite advantages.While the conventional approach requires oneto compute the adjoint correctly, we do not.The method of this book(which I believe is properly called conjugate directions)has a robustness that, I'm told,has been superior in some important geophysical applications.The conventional approach seems to get in trouble whentranspose operators are computed with insufficient precision.%On the other hand, I can envision applications where%the conventional approach would be preferable.%The matrix $\bold M$ is the size of model space times the size of data space%while the matrix $\bold\sigma_m^{-2}$ is the size of model space squared.%Clearly, there is the potential for some applications to speed up%by switching from our approach to the conventional approach.%\section{ACKNOWLEDGEMENTS}%Nearly everything I know about null spaces%I learned from Dave Nichols, Bill Symes, Bill Harlan, and Sergey Fomel.\unboldmath%\activeplot{name}{width=6in,height=}{}{caption}%\activesideplot{name}{height=1.5in,width=}{}{caption}%%\begin{equation}%\label{eqn:}%\end{equation}%%\ref{fig:}%(\ref{eqn:})%\end{notforlecture}\clearpage

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -