📄 paper.tex
字号:
(on the missing data).Thus$ \bold 0 \approx \bold A(\bold I-\bold K+\bold K)\bold m $ or\begin{equation}\bold 0 \quad\approx\quad\bold A (\bold I-\bold K) \bold m\ +\ \bold A \bold m_k\;,\label{eqn:style0}\end{equation}where we define $\bold m_k$ to be the datawith missing values set to zero by $\bold m_k=\bold K\bold m$.\parA second way to find missing data is with the set of goals\begin{equation}\begin{array}{rrrrr}\bold 0 & \approx & \bold K \bold m & - & \bold m_k \\\bold 0 & \approx & \epsilon \bold A \bold m & &\end{array}\label{eqn:style1}\end{equation}and take the limit as the scalar $\epsilon \rightarrow 0$.At that limit, we should have the same resultas equation (\ref{eqn:style0}).\parThere is an important philosophical difference betweenthe first method and the second.The first method strictly honors the known data.The second method acknowledges that when data misfitsthe regularization theory, it might be the fault of the dataso the data need not be strictly honored.Just what balance is proper falls to the numerical choice of $\epsilon$,a nontrivial topic.\parA third way to find missing data is to preconditionequation (\ref{eqn:style1}),namely, try the substitution $\bold m = \bold A^{-1} \bold p $.\begin{equation}\begin{array}{rrrrr}\bold 0 & \approx & \bold K \bold A^{-1} \bold p &-& \bold m_k \\\bold 0 & \approx & \epsilon \bold p & &\end{array}\label{eqn:style2}\end{equation}There is no simple way of knowing beforehandwhat is the best value of $\epsilon$.Practitioners like to see solutions for various values of $\epsilon$.Of course that can cost a lot of computational effort.Practical exploratory data analysis is more pragmatic.Without a simple clear theoretical basis,analysts generally begin from $\bold p=0$and abandon the fitting goal $\epsilon \bold I \bold p\approx 0$.Implicitly, they take $\epsilon=0$.Then they examine the solution as a function of iteration,imagining that the solution at larger iterationscorresponds to smaller $\epsilon$.There is an eigenvector analysisindicating some kind of basis for this approach,but I believe there is no firm guidance.\parBefore we look at coding details for the three methodsof filling the empty bins,we'll compare results of trying all three methods.For the roughening operator $\bold A$,we'll take the helix derivative $\bold H$.This is logically equivalent to roughening with the gradient $\bold \nabla$because the (negative) laplacian operator is$\bold\nabla'\bold\nabla = \bold H'\bold H $.%Later, in Chapter%\ref{mda/paper:mda}%we'll find a smarter roughening operator $\bold A$ that gives better results.\subsection{SEABEAM: Filling the empty bins with a laplacian}\sx{seabeam}\inputdir{seab}Figure~\ref{fig:seabin} shows a day's worth of data\footnote{ I'd like to thank Alistair Harding for this interesting data set named April 18. \sx{Harding, Alistair} }collected at sea by \bx{SeaBeam},an apparatus for measuring water depth bothdirectly under a ship, and somewhat off to the sides of the ship's track.The data is measurements of depth $h(x,y)$ at miscellaneous locationsin the $(x,y)$-plane.\sideplot{seabin}{width=3.0in,height=3.0in}{ Depth of the ocean under ship tracks. Empty bins are displayed with an average depth $\bar h$.}The locations are scattered about,according to various aspectsof the ship's navigation and the geometry of the SeaBeam sonic antenna.Figure \ref{fig:seabin} was made by binning with\texttt{bin2()} \vpageref{lst:bin2}and equation (\ref{ajt/eqn:dpbin}).The spatial spectra of the noise in the datacould be estimated where tracks cross over themselves.This might be worth while, but we do not pursue it now.\inputdir{seabeam}\parHere we focus on the empty mesh locations where no data is recorded(displayed with the value of the mean depth $\bar h$).These empty bins were filled withmodule \texttt{mis2} \vpageref{lst:mis2}.Results are in Figure \ref{fig:prcfill}.\plot{prcfill}{width=6in,height=8in}{ The ocean bottom restoring missing data with a helix derivative.}In Figure \ref{fig:prcfill} the left column results from 20 iterationswhile the right column results from 100 iterations.\parThe top row in Figure \ref{fig:prcfill} shows that more iterationsspreads information further into the region of missing data.\parIt turned out that the original method strictly honoring known datagave results so similar to the second method (regularizing)that the plots could not be visually distinguished.The middle row in Figure \ref{fig:prcfill} therefore showsthe difference in the result of the two methods.We see an outline of the transition between known and unknown regions.Obviously, the missing data is pulling known data towards zero.\parThe bottom row in Figure \ref{fig:prcfill} shows that preconditioningspreads information to great distances much quickerbut early iterations make little effort to honor the data.(Even though these results are for $\epsilon=0$.)Later iterations make little change at long distancebut begin to restore sharp details on the small featuresof the known topography.\parWhat if we can only afford 100 iterations?Perhaps we should first do 50 iterations withpreconditioning to develop the remote part of the solution andthen do 50 iterations by one of the other two methods to be surewe attended to the details near the known data.A more unified approach (not yet tried, as far as I know)would be to unify the techniques.The conjugate direction method searches two directions,the gradient and the previous step.We could add a third direction,the smart direction of equation (\ref{eqn:newdirection}).Instead of having a $2\times 2$ matrixsolution like equation (\ref{lsq/eqn:twobytwosln}) for two distances,we would need to solve a $3\times 3$ matrix for three.\parFigure \ref{fig:prcfill} has a few artifacts connected with theuse of the helix derivative.Examine equation (\ref{hlx/eqn:lapfac}) to notice the shape of the helix derivative.In principle, it is infinitely long in the horizontal axisin both equation (\ref{hlx/eqn:lapfac}) and Figure \ref{fig:prcfill}.In practice, it is truncated. The truncation is visible as bandsalong the sides of Figure \ref{fig:prcfill}.\parAs a practical matter, no one would use the first two bin filling methodswith helix derivative for the roughenerbecause it is theoretically equivalent to the gradient operator$\bold \nabla$ which has many fewer coefficients.Later, in Chapter\ref{mda/paper:mda}we'll find a much smarter roughening operator $\bold A$called the Prediction Error Filter (PEF)which gives better results.%The top row in Figure \ref{fig:prcfill} shows that more iterations%The second %After many iterations, both regularization and preconditioning%lead us to the same result.%After a small number of iterations, we see that%regularization has filled the small holes%but it has not reached out far away from the known data.%With preconditioning, it is the opposite.\subsection{Three codes for inverse masking}The selection (or masking) operator $\bold K$ isimplemented in\texttt{mask()} \vpageref{lst:mask}.\opdex{mask}{copy under mask}{45}{50}{filt/lib}%The inverting of the mask operator proceeds much as%we inverted the linear interpolation operator with%module \texttt{invint2} \vpageref{lst:invint2}.%The main difference is we swap the selection operator%for the linear interpolation operator.%(Philosophically, selection is like binning which is%like nearest-neighbor interpolation.)All the results shown in Figure \ref{fig:prcfill}were created with the module\texttt{mis2} \vpageref{lst:mis2}.Code locations with \texttt{style=0,1,2}correspond to the fitting goals(\ref{eqn:style0}), (\ref{eqn:style1}), (\ref{eqn:style2}).\moddex{mis2}{Missing data interpolation with and without preconditioning}{28}{58}{user/gee} %XX\par%It is instructive to compare%\texttt{mis2} \vpageref{lst:mis2} with%\texttt{invint2} \vpageref{lst:invint2}.%Both are essentially filling empty regions%consistant with prior knowledge at particular locations%and minimizing energy of the filtered field.%Both use the helix and can be used in $N$-dimensional space.\section{THEORY OF UNDERDETERMINED LEAST-SQUARES}Construct theoretical data with\begin{equation} \bold d \eq \bold F \bold m \label{eqn:theordata}\end{equation}Assume there are fewer data points than model pointsand that the matrix $\bold F \bold F'$ is invertible.From the theoretical data we estimate a model $\bold m_0$ with\begin{equation} \bold m_0 \eq \bold F' (\bold F \bold F')^{-1} \bold d \label{eqn:underdetest}\end{equation}To verify the validity of the estimate,insert the estimate (\ref{eqn:underdetest}) into thedata modeling equation (\ref{eqn:theordata}) and noticethat the estimate $\bold m_0$ predicts the correct data.Notice that equation(\ref{eqn:underdetest}) is not the sameas equation (\ref{lsq/eqn:sln}) which we derived much earlier.What's the difference?The central issue is which matrix of$\bold F \bold F'$ and$\bold F' \bold F$ actually has an inverse.If $\bold F$ is a rectangular matrix,then it is certain that one of the two is not invertible.(There are plenty of real cases where neither matrix is invertible.That's one reason we use iterative solvers.)Here we are dealing with the case with more model points than data points.\parNow we will show that of all possible models $\bold m$ thatpredict the correct data, $\bold m_0$ has the least energy.(I'd like to thank Sergey Fomel for this clear and simple proofthat does {\em not} use Lagrange multipliers.)First split (\ref{eqn:underdetest}) into an intermediateresult $\bold d_0$ and final result:\begin{eqnarray} \bold d_0 &=& (\bold F \bold F')^{-1} \bold d \\ \bold m_0 &=& \bold F' \bold d_0\end{eqnarray}Consider another model ($\bold x$ not equal to zero)\begin{equation} \bold m \eq \bold m_0 + \bold x\end{equation}which fits the theoretical data$\bold d = \bold F(\bold m_0+\bold x)$.Since$\bold d=\bold F\bold m_0$,we seethat $\bold x$ is a null space vector.\begin{equation} \bold F \bold x \eq \bold 0\end{equation}First we see that $\bold m_0$ is orthogonal to $\bold x$ because\begin{equation} \bold m_0' \bold x \eq (\bold F' \bold d_0)' \bold x \eq \bold d_0' \bold F \bold x \eq \bold d_0' \bold 0 \eq 0 \label{eqn:dzero}\end{equation}Therefore,\begin{equation}\bold m' \bold m \eq\bold m_0' \bold m_0 + \bold x'\bold x + 2 \bold x' \bold m_0 \eq\bold m_0' \bold m_0 + \bold x'\bold x \quad\ge\quad \bold m_0' \bold m_0 \end{equation}so adding null space to $\bold m_0$ can only increase its energy.In summary,the solution$\bold m_0 = \bold F' (\bold F \bold F')^{-1} \bold d$has less energy than any other model that satisfies the data.\parNot only does the theoretical solution$\bold m_0 = \bold F' (\bold F \bold F')^{-1} \bold d$have minimum energy,but the result of iterative descent will too,provided that we begin iterations from $\bold m_0=0$ or any $\bold m_0$with no null-space component.In (\ref{eqn:dzero}) we see that the orthogonality $\bold m_0'\bold x=0$does not arise because $\bold d_0$ has any particular value.It arises because $\bold m_0$ is of the form $\bold F'\bold d_0$.Gradient methods contribute $\Delta\bold m =\bold F' \bold r$which is of the required form.%\section{VIRTUAL-RESIDUAL PRECONDITIONING}%Sergey Fomel has developed a wholly different%approach to preconditioning to that described above.%He calls his method the ``model-extension'' method.%I call it the ``virtual-residual'' method.%Both names are suggestive of the method.%\par%We begin with the usual definition%of preconditioned variables $\bold m =\bold S\bold p$ and%the opposite of our usual sign convention for the residual.%\begin{eqnarray}%-\epsilon \ \bold r &=& \bold F \bold m - \bold d \\%-\epsilon \ \bold r &=& \bold F \bold S \bold p - \bold d%\label{eq:r}%\end{eqnarray}%which we arrange with a column vector of unknowns that includes%not only $\bold p$ but also $\bold r$.%\begin{equation}%\bold 0%\quad\approx\quad \hat {\bold r}%\eq %\left[ \bold F \bold S \quad \epsilon \ \bold I \right]%\%\left[%\begin{array}{c}%\bold p \\%\bold r%\end{array}%\right]%\ - \ \bold d%\end{equation}%The interesting thing is that we can use our familiar%conjugate-gradient programs with this system of equations.%We only need to be careful to distinguish the residual in the bottom half%of our solution vector,%from the virtual residual $\hat {\bold r}$%(which should vanish identically)%in the iterative fitting problem.%The fitting begins with the initializations:%\begin{equation}%\hat{\bold p}_0 \eq \left[
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -