📄 paper.tex

📁 国外免费地震资料处理软件包
💻 TEX
📖 第 1 页 / 共 5 页
字号:
  \end{array}\right]\label{eqn:cg1b}\end{eqnarray}\parA \bx{contour plot} is based on an altitude function of space.The altitude is the \bx{dot product}  $\bold r \cdot \bold r$.By finding the lowest altitude,we are driving the residual vector  $\bold r$  as close as we can to zero.If the residual vector  $\bold r$  reaches zero, then we have solvedthe simultaneous equations  $\bold d= \bold F \bold x$.In a two-dimensional world the vector  $\bold x$  has two components,$(x_1 , x_2 )$.A contour is a curve of constant$\bold r \cdot \bold r$  in $(x_1 , x_2 )$-space.These contours have a statistical interpretation as contoursof uncertainty in $(x_1 , x_2 )$, with measurement errors in $\bold d$.\parLet us see how a random search-directioncan be used to reduce the residual$0\approx \bold r= \bold F \bold x - \bold d$.Let $\Delta \bold x$ be an abstract vectorwith the same number of components as the solution $\bold x$,and let $\Delta \bold x$ contain arbitrary or random numbers.We add an unknown quantity $\alpha$of vector $\Delta \bold x$ to the vector $\bold x$,and thereby create $\bold x_{\rm new}$:\begin{equation}\bold x_{\rm new} \eq \bold x+\alpha \Delta \bold x\label{eqn:oldx}\end{equation}This gives a new residual:\begin{eqnarray}\bold r_{\rm new} &=& \bold F\ \bold x_{\rm new}           -\bold d \\\bold r_{\rm new} &=& \bold F( \bold x+\alpha\Delta\bold x)-\bold d\\\bold r_{\rm new} \eq\bold r+\alpha \Delta\bold r                  &=& (\bold F \bold x-\bold d)                                                +\alpha\bold F\Delta\bold x \label{eqn:resupdatelin}\end{eqnarray}which defines $\Delta \bold r = \bold F \Delta \bold x$.\parNext we adjust $\alpha$ to minimize the dot product:$ \bold r_{\rm new} \cdot \bold r_{\rm new} $\begin{equation}(\bold r+\alpha\Delta \bold r)\cdot (\bold r+\alpha\Delta \bold r) \eq\bold r\cdot \bold r + 2\alpha (\bold r \cdot \Delta \bold r) \ +\ \alpha^2 \Delta \bold r \cdot \Delta \bold r\label{eqn:mindot}\end{equation}Set to zero its derivative with respect to  $\alpha$ using the chain rule\begin{equation}0\eq (\bold r+\alpha\Delta \bold r)\cdot\Delta \bold r\ +\ \Delta \bold r\cdot(\bold r+\alpha\Delta \bold r)\eq2(\bold r+\alpha\Delta \bold r)\cdot\Delta \bold r\label{eqn:newresperp}\end{equation}which says thatthe new residual $\bold r_{\rm new} = \bold r + \alpha \Delta \bold r$ isperpendicular to the ``fitting function'' $\Delta \bold r$.Solving gives the required value of $\alpha$.\begin{equation}\alpha \eq - \ { (\bold r \cdot \Delta \bold r ) \over( \Delta \bold r \cdot \Delta \bold r ) }\label{eqn:alfa}\end{equation}\parA ``computation \bx{template}'' for the method of random directions is\def\padarrow{\quad\longleftarrow\quad}\label{lsq/'randtemplate'}\begin{tabbing}mmmmmm \= mmmmmm \= mmmmm \kill\> $\bold r \padarrow \bold F \bold x - \bold d$ \\\> {\rm iterate \{ }                                                    \\\>      \>  $\Delta \bold x   \padarrow {\rm random\ numbers}$          \\\>      \>  $\Delta \bold r\  \padarrow \bold F \  \Delta \bold x$      \\\>      \> $\alpha \padarrow                -(       \bold r \cdot \Delta\bold r )/                 (\Delta \bold r \cdot \Delta\bold r )                $                \\\>      \> $\bold x   \padarrow \bold x   + \alpha\Delta \bold x$       \\\>      \> $\bold r\  \padarrow \bold r\  + \alpha\Delta \bold r$       \\\>      \> \}                                                   \end{tabbing}A nice thing about the method of random directions is that youdo not need to know the adjoint operator $\bold F'$.\parIn practice, random directions are rarely used.It is more common to use the \bx{gradient} direction than a random direction.Notice that a vector of the size of $\Delta \bold x$ is\begin{equation}\bold g \eq  \bold F' \bold r\end{equation}Notice also that this vector can be found by taking the gradientof the size of the residuals:\begin{equation}{\partial \over  \partial \bold x' }  \ \bold r \cdot \bold r\eq{\partial \over  \partial \bold x' }  \ ( \bold x' \, \bold F'  \ -\  \bold d') \,( \bold F  \, \bold x   \ -\  \bold d)\eq\bold F' \  \bold r\end{equation}Choosing $\Delta \bold x$ to be the gradient vector$\Delta\bold x = \bold g = \bold F' \bold r$is called ``the method of \bx{steepest descent}.''\parStarting from a model $\bold x = \bold m$ (which may be zero),below is a \bx{template} of pseudocode for minimizing the residual$\bold 0\approx \bold r = \bold F \bold x - \bold d$by the steepest-descent method:\label{lsq/'sdtemplate'}\begin{tabbing}mmmmmm \= mmmmmm \= mmmmm \kill\> $\bold r \padarrow \bold F \bold x - \bold d$ \\\> {\rm iterate \{ }                                                    \\\>      \>  $\Delta \bold x   \padarrow \bold F'\         \bold r$      \\\>      \>  $\Delta \bold r\  \padarrow \bold F \  \Delta \bold x$      \\\>      \> $\alpha \padarrow                -(       \bold r \cdot \Delta\bold r )/                 (\Delta \bold r \cdot \Delta\bold r )                $                \\\>      \> $\bold x   \padarrow \bold x   + \alpha\Delta \bold x$       \\\>      \> $\bold r\  \padarrow \bold r\  + \alpha\Delta \bold r$       \\\>      \> \}                                                   \end{tabbing}\subsection{Null space and iterative methods}In applications where we fit$\bold d \approx\bold F \bold x$,there might exist a vector (or a family of vectors)defined by the condition $\bold 0 =\bold F \bold x_{\rm null}$.This family is called a \bx{null space}.For example, if the operator $\bold F$ is a time derivative,then the null space is the constant function;if the operator is a second derivative,then the null space has two components, a constant functionand a linear function, or combinations of them.The null space is a family of model components that have no effect on the data.\parWhen we use the steepest-descent method,we iteratively find solutions by this updating:\begin{eqnarray}\bold x_{i+1} &=& \bold x_i + \alpha \Delta \bold x                     \\\bold x_{i+1} &=& \bold x_i + \alpha \bold F' \bold r                   \\\bold x_{i+1} &=& \bold x_i + \alpha \bold F'(\bold F\bold x -\bold d)\end{eqnarray}After we have iterated to convergence,the gradient $ \Delta \bold x$ vanishesas does $\bold F'(\bold F\bold x -\bold d)$.Thus, an iterative solver gets the same solutionas the long-winded theory leading to equation (\ref{eqn:sln}).\parSuppose that by adding a huge amount of $\bold x_{\rm null}$,we now change $\bold x$and continue iterating.Notice that $ \Delta \bold x$ remains zerobecause $\bold F \bold x_{\rm null}$ vanishes.Thus we conclude that any null space in the initial guess $\bold x_0$will remain there unaffected by the gradient-descent process.\parLinear algebra theory enables us to dig up the entire null spaceshould we so desire.On the other hand, the computer demands might be vast.Even the memory for holding the many $\bold x$ vectors could be prohibitive.A much simpler and more practical goalis to find out if the null space has any members,and if so, to view some of them.To try to see a member of the null space,we take two starting guessesand we run our iterative solver for each of them.If the two solutions,$\bold x_1$ and $\bold x_2$,are the same, there is no null space.If the solutions differ, the differenceis a member of the null space.Let us see why:Suppose after iterating to minimum residual we find\begin{eqnarray}\bold r_1 &=& \bold F\bold x_1 - \bold d\\\bold r_2 &=& \bold F\bold x_2 - \bold d \end{eqnarray}We know that the residual squared is a convex quadratic functionof the unknown $\bold x$.Mathematically that means the minimum value is unique,so $\bold r_1 =\bold r_2$.Subtractingwe find$0=\bold r_1-\bold r_2 =\bold F(\bold x_1-\bold x_2)$proving that $\bold x_1-\bold x_2$ is a model in the null space.Adding $\bold x_1-\bold x_2$ to any to any model $\bold x$will not change the theoretical data.Are you having trouble visualizing $\bold r$ being unique,but $\bold x$ not being unique?  Imagine that $\bold r$happens to be independent of one of the components of $\bold x$.That component is nonunique.More generally, it is some linear combination of components of $\bold x$that $\bold r$ is independent of.\par\boxit{        A practical way to learn about the existence of null spaces        and their general appearance is simply to try        gradient-descent methods        beginning from various different starting guesses.        }\par``Did I fail to run my iterative solver long enough?'' isa question you might have.If two residuals from two starting solutions are not equal,$\bold r_1 \ne \bold r_2$,then you should be running your solver through more iterations.\par\boxit{        If two different starting solutions        produce two different residuals,        then you didn't run your solver through enough iterations.        }\subsection{Why steepest descent is so slow}Before we can understand why the \bx{conjugate-direction method} is so fast,we need to see why the\bxbx{steepest-descent method}{steepest descent}is so slow.Imagine yourself sitting on the edge of a circular bowl.If you jump off the rim, you'll slide straight to the bottomat the center.Now imagine an ellipsoidal bowl of very large ellipticity.As you jump off the rim, you'll first move in thedirection of the gradient.This is not towards the bottom at the center of the ellipse(unless you were sitting on the major or minor axis).\parWe can formalize the situation.A parametric equation for a line is$\bold x=\bold x_{\rm old} +\alpha \Delta \bold x$where $\alpha$ is the parameter for moving on the line.The process of selecting $\alpha$ is called ``\bx{line search}."Think of a two-dimensional examplewhere the vector of unknowns $\bold x$has just two components, $x_1$ and $x_2$.Then the size of the residual vector $\bold r \cdot \bold r$ can bedisplayed with a contour plot in the plane of $(x_1,x_2)$.Our ellipsoidal bowl has ellipsoidal contours of constant altitude.As we move in a line across this space by adjusting $\alpha$,equation(\ref{eqn:mindot})gives our altitude.This equation has a unique minimum because it is a parabola in $\alpha$.As we approach the minimum,our trajectory becomes tangential to a contour line in  $(x_1,x_2)$-space.This is where we stop.Now we compute our new residual $\bold r$and we compute the new gradient$\Delta \bold x = \bold g = \bold F' \bold r$.OK, we are ready for the next slide down.When we turn ourselves from "parallel to a contour line"to the direction of $\Delta \bold x$ which is "perpendicular to that contour",we are turning $90^\circ$.Our path to the bottom of the bowl will be made of many segments,each turning $90^\circ$ from the previous.We will need an infinite number of such steps to reach the bottom.It happens that the amazing conjugate-direction methodwould reach the bottom in just two jumps(because $(x_1,x_2)$ is a two dimensional space.)\todo{Missing figure (ls-sawtooth) A search path for steepest descent.}\subsection{Conjugate direction}In the \bx{conjugate-direction method}, not a line, but rather a plane,is searched.A plane is made from an arbitrary linear combination of two vectors.One vector will be chosen to be the gradient vector, say  $\bold g$.The other vector will be chosen to be the previous descent step vector,say  $\bold s = \bold x_j - \bold x_{j-1}$.Instead of  $\alpha \, \bold g$  we need a linear combination,say  $\alpha \bold g + \beta  \bold s$.For minimizing quadratic functions the plane search requiresonly the solution of a two-by-two set of linear equationsfor  $\alpha$  and  $\beta$.The equations will be specified here along with the program.(For %\it nonquadratic %\rm functions a plane search is considered intractable,whereas a line search proceeds by bisection.)\parFor use in linear problems,the conjugate-direction method described in this bookfollows an identical path with the more well-known conjugate-gradient method.We use the conjugate-direction methodfor convenience in exposition and programming.\par%\boxit{        The simple form of the conjugate-direction algorithm covered here        is a sequence of steps.        In each step the minimum is found in the plane given by two vectors:        the gradient vector and the vector of the previous step.%        }%\begin{comment}%\subsection{Magic (abandoned)}%Some properties of the conjugate-gradient and the conjugate-direction approach%are well known but hard to explain.%D. G. \bx{Luenberger}'s book,%{\it Introduction to Linear and Nonlinear Programming},%is a good place to find formal explanations of this magic.%(His book also provides other forms of these algorithms.)%Another helpful book is \bx{Strang}'s {\it Introduction to Applied Mathematics}.%Known properties follow:%\begin{itemize}%\item[{1.}]%The \bx{conjugate-gradient method} and the%\bx{conjugate-direction method}%get the exact answer%(assuming exact arithmetic) in  $n$  descent steps (or less),%where  $n$  is the number of unknowns.%\item[{2.}]%It is helpful to use the previous step,%so you might wonder why not use the previous two steps,%because it is not hard to solve a three-by-three set%of simultaneous linear equations.%It turns out that the third direction does not help:%the distance moved in the extra direction is zero.%\end{itemize}%\end{comment}%\subsection{Magical properties of conjugate directions (Sergey)}Given the linear operator $\bold F$ and a generator of solution steps(random in the case of random directions or gradient in the case ofsteepest descent),we can construct an optimally convergent iteration process,which finds the solution in no more than $n$ steps,where $n$ is the size of the problem.This result should not be surprising.If $\bold F$ is represented by a full matrix,then the cost of direct inversion is proportional to $n^3$,and the cost of matrix multiplication is $n^2$.Each step of an iterative method boils down to a matrix multiplication.Therefore, we need at least $n$ steps to arrive at the exact solution.Two circumstances make large-scale optimization practical.First, for sparse convolution matricesthe cost of matrix multiplication is $n$ instead of $n^2$.Second, we can often find a reasonably good solutionafter a limited number of iterations.If both these conditions are met, the cost of optimizationgrows linearly with $n$,which is a practical rate even for very large problems.%\subsection{Conjugate-direction theory for programmers}
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -