📄 paper.tex
字号:
\begin{array}{c} 1 \\ a_1 \\ a_2 \end{array} \right] \ +\ \left[ \begin{array}{c} y_2 \\ y_3 \\ y_4 \\ y_5 \\ y_6 \end{array} \right] \label{eqn:stdpef}\end{equation}Let us move from this specific fitting goal to the general case.(Notice the similarity of the free-mask matrix $\bold K$in this filter estimation problem with thefree-mask matrix $\bold J$ in missing data goal (\ref{iin/eqn:migraine}).)The fitting goal is,\begin{eqnarray}\bold 0 &\approx & \bold Y \bold a \\\bold 0 &\approx & \bold Y(\bold I-\bold K+\bold K) \bold a \\\bold 0 &\approx & \bold Y\bold K\bold a +\bold Y(\bold I-\bold K)\bold a \\\bold 0 &\approx & \bold Y\bold K\bold a +\bold Y \bold a_0 \\\bold 0 &\approx & \bold Y\bold K\bold a +\bold y \\ \bold 0 \quad\approx\quad\bold r &= & \bold Y\bold K\bold a +\bold r_0\label{eqn:pefregression}\end{eqnarray}which means we initialize the residual with$ \bold r_0 = \bold y$.and then iterate with\begin{eqnarray}\Delta \bold a &\longleftarrow& \bold K' \bold Y'\ \bold r \\\Delta \bold r &\longleftarrow& \bold Y \bold K \ \Delta \bold a \end{eqnarray}%Bringing this all together gives us subroutine \texttt{gdecon()}.%\moddex{gdecon}{gapped decon filt}%\end{notforlecture}\section{PREDICTION-ERROR FILTER OUTPUT IS WHITE}\subsubsection{The relationship between spectrum and PEF}Knowledge of an autocorrelation functionis equivalent to knowledge of a spectrum.The two are simply related by Fourier transform.A spectrum or an autocorrelation function encapsulatesan important characteristic of a signal or an image.Generally the spectrum changes slowly from place to placealthough it could change rapidly.Of all the assumptions we could make to fill empty bins,one that people usually find easiest to agree with is thatthe spectrum should be the samein the empty-bin regions as where bins are filled.In practice we deal with neither the spectrumnor its autocorrelation but with a third object.This third object is the Prediction Error Filter (PEF),the filter in equation (\ref{eqn:simplepef}).\parTake equation (\ref{eqn:simplepef}) for $\bold r$ and multiply itby the adjoint $\bold r'$ getting a quadratic form in the PEFcoefficients. Minimizing this quadratic form determines the PEF.This quadratic form depends only on the autocorrelationof the original data $y_t$, not on the data $y_t$ itself.Clearly the PEF is unchanged if the data has its polarity reversedor its time axis reversed.Indeed, we'll see here that knowledge of the PEFis equivalent to knowledge of the autocorrelation or the spectrum.\subsubsection{Undoing convolution in nature}\inputdir{XFig}Prediction-error filtering is also called ``\bx{deconvolution}''.This word goes back to very basic models and concepts.In this model one envisionsa random white-spectrum excitation function $\bold x$existing in nature, and this excitation functionis somehow filtered by unknown natural processes,with a filter operator $\bold B$producing an {\it output} $\bold y$ in naturethat becomes the {\it input} $\bold y$to our computer programs.This is sketched in Figure \ref{fig:systems}.\sideplot{systems}{width=3in,height=1in}{ Flow of information from nature, to observation, into computer.}Then we design a prediction-error filter $\bold A$ on $\bold y$,which yields a white-spectrumed output residual $\bold r$.Because $\bold r$ and $\bold x$ theoretically have the same spectrum,the tantalizing prospect is that maybe $\bold r$ equals $\bold x$,meaning that the PEF $\bold A$ has {\it deconvolved}the unknown convolution $\bold B$.\subsubsection{Causal with causal inverse}Theoretically, a PEF is a causal filter with a causal inverse.This adds confidence to the likelihood that deconvolutionof natural processes with a PEF might get the correct phase spectrumas well as the correct amplitude spectrum.Naturally, the PEF does not give the correct phase to an ``all-pass'' filter.That is a filter with a phase shift but a constant amplitude spectrum.(I think most migration operators are in this category.)\parTheoretically we should be able to use a PEFin either convolution or polynomial division.There are some dangers though,mainly connected with dealing with data in small windows.Truncation phenomena might give us PEF estimatesthat are causal, but whose inverse is not,so they cannot be used in polynomial division.This is a lengthy topic in the classic literature.This old, fascinating subject is examined in my books, FGDP and PVI.A classic solution is one by John Parker Burg.We should revisit the Burg method in light of the helix.\subsubsection{PEF output tends to whiteness}The most important property of a \bx{prediction-error filter}or \bx{PEF} is thatits output tends to a \bx{white spectrum} (to be proven here).No matter what the input to this filter,its output tends to whiteness as the number of the coefficients$n \rightarrow \infty$ tends to infinity.Thus, the \bx{PE filter} adapts itself to the inputby absorbing all its \bx{color}.This has important statistical implications andimportant geophysical implications.\subsubsection{Spectral estimation}\parThe PEF's output being white leads to an important consequence:To specify a spectrum,we can give the spectrum (of an input) itself,give its autocorrelation,or give its PEF coefficients.Each is transformable to the other two.Indeed, an effective mechanism of spectral estimation,developed by John P.~\bx{Burg} and describedin \bx{FGDP},is to compute a PE filter and look at the inverse of its spectrum.\subsubsection{Short windows}\parThe power of a PE filter is that a short filter can often extinguish,and thereby represent, the information in a long resonant filter.If the input to the PE filter is a sinusoid,it is exactly predictable by a three-term recurrence relation,and all the color is absorbed by a three-term PE filter (see exercises).Burg's spectral estimation is especially effective in short windows.\subsubsection{Weathered layer resonance}\parThat the output spectrum of a PE filter is \bx{white} is alsouseful geophysically.Imagine the reverberation of the \bx{soil} layer,highly variable from place to place,as the resonance between the surface and shallowmore-consolidated soil layersvaries rapidly with surface locationbecauseof geologically recent fluvial activity.The spectral \bx{color} of this erratic variation on surface-recordedseismograms is compensated for by a PE filter.Usually we do not want PE-filtered seismograms to be white,but once they all have the same spectrum,it is easy to postfilter them to any desired spectrum.\subsection{PEF whiteness proof in 1-D}\par\label{mda/'white_proof'}The basic idea of least-squares fittingis that the residual is orthogonal to the fitting functions.Applied to the PE filter, this idea meansthat the output of a PE filter is orthogonal to lagged inputs.The \bx{orthogonality} applies only for lags in the past,because prediction knows only the past while it aims to the future.What we want to show here is different,namely, that the output is uncorrelated with {\it itself}(as opposed to the input) for lags in {\it both} directions;hence the output spectrum is \bx{white}.\parIn (\ref{eqn:tworegrs}) are two separate and independent autoregressions,$\bold 0\approx\bold Y_a\bold a$for finding the filter $\bold a$,and$\bold 0\approx\bold Y_b\bold b$for finding the filter $\bold b$.By noticing that the two matrices are really the same(except a row of zeros on the bottom of$\bold Y_a$is a row in the top of$\bold Y_b$)we realize that the two regressions must result in the same filters$\bold a =\bold b$,and the residual $\bold r_b$ is a shifted version of $\bold r_a$.In practice, I visualize the matrix being a thousand components tall(or a million)and a hundred components wide.\begin{equation}\bold 0\ \approx\ \bold r_a \ =\ \left[ \begin{array}{ccc} y_1 & 0 & 0 \\ y_2 & y_1 & 0 \\ y_3 & y_2 & y_1 \\ y_4 & y_3 & y_2 \\ y_5 & y_4 & y_3 \\ y_6 & y_5 & y_4 \\ 0 & y_6 & y_5 \\ 0 & 0 & y_6 \\ 0 & 0 & 0 \end{array} \right] \; \left[ \begin{array}{c} 1 \\ a_1 \\ a_2 \end{array} \right]\ ; \quad\quad\bold 0\ \approx\ \bold r_b \ =\ \left[ \begin{array}{ccc} 0 & 0 & 0 \\ y_1 & 0 & 0 \\ y_2 & y_1 & 0 \\ y_3 & y_2 & y_1 \\ y_4 & y_3 & y_2 \\ y_5 & y_4 & y_3 \\ y_6 & y_5 & y_4 \\ 0 & y_6 & y_5 \\ 0 & 0 & y_6 \end{array} \right] \; \left[ \begin{array}{c} 1 \\ b_1 \\ b_2 \end{array} \right]\label{eqn:tworegrs}\end{equation}When the energy $\bold r'\bold r$of a residual has been minimized,the residual $\bold r$ is orthogonal to the fitting functions.For example, choosing $a_2$ to minimize$\bold r'\bold r$gives$0=\partial\bold r'\bold r/\partial a_2=2\bold r'\partial\bold r/\partial a_2$.This shows that $\bold r'$ is perpendicular to $\partial \bold r / \partial a_2$which is the rightmost column of the $\bold Y_a$ matrix.Thus the vector $\bold r_a$is orthogonal to all the columns in the $\bold Y_a$ matrixexcept the first (because we do not minimize with respect to $a_0$).\parOur goal is a different theorem that is imprecise when appliedto the three coefficient filters displayed in (\ref{eqn:tworegrs}),but becomes valid as the filter length tends to infinity$\bold a = (1,a_1, a_2, a_3,\cdots)$and the matrices become infinitely wide.Actually, all we require is the last component in $\bold b$,namely $b_n$ tend to zero.This generally happens because as $n$ increases,$y_{t-n}$ becomes a weaker and weaker predictor of $y_t$.\parThe matrix $\bold Y_a$ containsall of the columns that are found in $\bold Y_b$except the last (and the last one is not important).This means that $\bold r_a$ is not only orthogonal to allof $\bold Y_a$'s columns (except the first)but $\bold r_a$ is also orthogonal to all of$\bold Y_b$'s columns except the last.Although $\bold r_a$ isn't really perpendicular to the last columnof $\bold Y_b$, it doesn't matter because that columnhas hardly any contribution to $\bold r_b$since $|b_n|<<1$.Because $\bold r_a$ is (effectively)orthogonal to all the components of $\bold r_b$,$\bold r_a$ is also orthogonal to $\bold r_b$ itself.(For any $\bold u$ and $\bold v$, if$\bold r\cdot \bold u=0$ and$\bold r\cdot \bold v=0$ then$\bold r\cdot (\bold u+ \bold v)=0$ and also$\bold r\cdot (a_1\bold u + a_2\bold v)=0$).\parHere is a detail:In choosing the example of equation (\ref{eqn:tworegrs}),I have shifted the two fitting problems by only one lag.We would like to shift by more lags and get the same result.For this we need more filter coefficients.By adding many more filter coefficients we are adding many more columnsto the right side of $\bold Y_b$.That's good because we'll be needing to neglect more columnsas we shift $\bold r_b$ further from $\bold r_a$.Neglecting these columns is commonly justified by the experiencethat ``after short range regressors have had their effect,long range regressors generally find little remaining to predict.''(Recall that the damped harmonic oscillator from physics,the finite difference equation that predicts the future from the past,uses only two lags.)\parHere is the main point:Since $\bold r_b$ and $\bold r_a$ both contain the same signal $\bold r$but time-shifted,the orthogonality at all shifts means that the autocorrelationof $\bold r$vanishes at all lags.An exception, of course, is at zero lag.The autocorrelation does not vanish therebecause $\bold r_a$ is not orthogonal to its first column(because we did not minimize with respect to $a_0$).\parAs we redraw$\bold 0\approx\bold r_b =\bold Y_b\bold b$for various lags,we may shift the columns only downwardbecause shifting them upward would bring in the first columnof $\bold Y_a$ and the residual $\bold r_a$ is not orthogonal to that.Thus we have only proven thatone side of the autocorrelation of $\bold r$ vanishes.That is enough however, because autocorrelation functionsare symmetric, so if one side vanishes, the other must also.\parIf $\bold a$ and $\bold b$ were two-sidedfilters like $(\cdots ,b_{-2}, b_{-1}, 1, b_1, b_2, \cdots)$the proof would break.If $\bold b$ were two-sided, $\bold Y_b$ would catch the nonorthogonal column of $\bold Y_a$.Not only is $\bold r_a$ not proven to be perpendicularto the first column of $\bold Y_a$,but it cannot be orthogonal to itbecause a signal cannot be orthogonal to itself.\parThe implications of this theorem are far reaching.The residual $\bold r$,a convolution of $\bold y$with $\bold a$ has anautocorrelation that is an impulse function.The Fourier transform of an impulse is a constant.Thus the spectrum of the residual is ``white''.Thus $\bold y$ and $\bold a$ have mutually inverse spectra.\par\boxit{ Since the output of a PEF is white, the PEF itself has a spectrum inverse to its input. }\parAn important application of the PEFis in missing data interpolation.We'll see examples later in this chapter.My third book,PVI\footnote{ http://sepwww.stanford.edu/sep/prof/pvi/toc\_html/index.html }has manyexamples\footnote{ http://sepwww.stanford.edu/sep/prof/pvi/tsa/paper\_html/node1.html }in one dimension with both synthetic data and field dataincluding the \texttt{gap} parameter.Here we next extend these ideas to two (or more) dimensions.\subsection{Simple dip filters}\sx{filter ! multidimensional}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -