📄 paper.tex
字号:
% copyright (c) 2005 Jon Claerbout\title{Model fitting by least squares}\author{Jon Claerbout}\maketitle\label{paper:lsq}%\def\figdir{Fig}%\def\sx#1{}%\def\bx{}%\def\bxbx#1#2{#1}%\def\eq{\quad =\quad}\def\ff{{\bf f}}\def\dd{{\bf d}}\sx{least squares}%\begin{notforlecture}\parThe first level of computer use in science and engineering is \bx{modeling}.Beginning from physical principles and design ideas,the computer mimics nature.After this, the worker looks at the result and thinks a while,then alters the modeling program and tries again.The next, deeper level of computer use is that the computer itselfexamines the results of modeling and reruns the modeling job.This deeper levelis variously called``\bx{fitting}" or``\bx{estimation}" or``\bx{inversion}."We inspect the \bx{conjugate-direction method} of fittingand write a subroutine for it that will be used in most ofthe examples in this monograph.%\end{notforlecture}\section{HOW TO DIVIDE NOISY SIGNALS}%\begin{notforlecture}If "inversion" is dividing by a matrix,then the place to begin is dividing one number by another,say one function of frequency by another function of frequency.A single parameter fitting problem arises in Fourier analysis,where we seek a ``best answer'' at each frequency,then combine all the frequencies to get a best signal.Thus emerges a wide family of interesting and useful applications.However, Fourier analysis first requires us to introduce complex numbersinto statistical estimation.\parMultiplication in the Fourier domain is \bx{convolution} in the time domain.Fourier-domain division is time-domain \bx{deconvolution}.This division is challenging when the divisor has observational error.Failure erupts if zero division occurs.More insidious are the poor results we obtainwhen zero division is avoided by a near miss.\subsection{Dividing by zero smoothly}\sx{divide by zero}\sx{zero divide}Think of any real numbers $x$, $y$, and $f$ where $y=xf$.Given $y$ and $f$ we see a computer program containing $x=y/f$.How can we change the program so that it never divides by zero?A popular answer is to change $x=y/f$to $x=yf/(f^2+\epsilon^2)$, where $\epsilon$ is any tiny value.When $|f| >> |\epsilon|$,then $x$ is approximately $y/f$ as expected.But when the divisor $f$ vanishes,the result is safely zero instead of infinity.The transition is smooth,but some criterion is needed to choose the value of $\epsilon$.This method may not be the only way or the best wayto cope with\bxbx{zero division}{zero divide},but it is a good way,and it permeates the subject of signal analysis.\parTo apply this method in the Fourier domain,suppose that $X$, $Y$, and $F$ are complex numbers.What do we do then with $X=Y/F$?We multiply thetop and bottom by the complex conjugate $\overline{F}$,and again add $\epsilon^2$ to the denominator.Thus,\begin{equation}X(\omega) \eq{ \overline{F(\omega)} \ Y(\omega) \over \overline{F(\omega)} F(\omega) \ +\ \epsilon^2}\label{eqn:z1}\end{equation}Now the denominator must always be a positive number greater than zero,so division is always safe.Equation~(\ref{eqn:z1}) ranges continuously from\bx{inverse filter}ing, with\sx{filter ! inverse}$X=Y/F$, to filtering with $X=\overline{F}Y$,which is called ``\bx{matched filter}ing.''\sx{filter ! matched}Notice that for any complex number $F$,the phase of $1/F$ equals the phase of $\overline{F}$,so the filters $1/F$ and $\overline{F}$have inverse amplitudes but identical phase.\subsection{Damped solution}Equation (\ref{eqn:z1}) is the solution to an optimization problemthat arises in many applications.Now that we know the solution, let us formally define the problem.First, we will solve a simpler problem with real values:we will choose to minimize the \bx{quadratic function} of $x$:\begin{equation}Q(x) \eq (f x-y)^2 + \epsilon^2 x^2\label{eqn:z2}\end{equation}The second term is called a ``\bx{damping} factor"because it prevents $x$ from going to $\pm \infty$ when $f\rightarrow 0$.Set $dQ/dx=0$, which gives\begin{equation}0 \eq f(f x-y) + \epsilon^2 x\label{eqn:z3}\end{equation}This yields the earlier answer $x=fy/(f^2+\epsilon^2)$.\parWith Fourier transforms,the signal $X$ is a complex number at each frequency $\omega$.So we generalize equation~(\ref{eqn:z2}) to\begin{equation}Q(\bar X, X) \eq(\overline{FX-Y}) (FX-Y) + \epsilon^2 \bar X X \eq(\bar X \bar F - \bar Y) (FX-Y) + \epsilon^2 \bar X X\label{eqn:z4}\end{equation}To minimize $Q$ we could use a real-values approach,where we express$X=u+iv$ in terms of two real values $u$ and $v$and then set $\partial Q/\partial u=0$ and $\partial Q/\partial v=0$.The approach we will take, however,is to use complex values,where we set$\partial Q/\partial X=0$ and $\partial Q/\partial \bar X=0$.Let us examine $\partial Q/\partial \bar X$:\begin{equation}{\partial Q(\bar X, X)\over \partial \bar X} \eq\bar F (FX-Y) + \epsilon^2 X \eq 0\label{eqn:z5}\end{equation}The derivative $\partial Q/\partial X$ isthe complex conjugate of $\partial Q/\partial \bar X$.So if either is zero, the other is too.Thus we do not need to specify both$\partial Q/\partial X=0$ and $\partial Q/\partial \bar X=0$.I usually set$\partial Q/\partial \bar X$ equal to zero.Solving equation~(\ref{eqn:z5}) for $X$gives equation~(\ref{eqn:z1}).\parEquation~(\ref{eqn:z1}) solves $Y=XF$ for $X$,giving the solution for what is called``the \bx{deconvolution} problem with a known wavelet $F$."Analogously we can use $Y=XF$ when the filter $F$ is unknown,but the input $X$ and output $Y$ are given.Simply interchange $X$ and $F$ in the derivation and result.\subsection{Smoothing the denominator spectrum}\parEquation~(\ref{eqn:z1}) gives us one way to divide by zero.Another way is stated by the equation\begin{equation}X(\omega) \eq{ \overline{F(\omega)} \ Y(\omega) \over \langle \overline{F(\omega)} F(\omega) \rangle}\label{eqn:z6}\end{equation}where the strange notation in the denominator meansthat the spectrum there should be smoothed a little.Such smoothing fills in the holes in the spectrumwhere zero-division is a danger,filling not with an arbitrary numerical value $\epsilon$but with an average of nearby spectral values.Additionally, if the denominator spectrum$\overline{F(\omega)} F(\omega)$ is rough,the smoothing creates a shorter autocorrelation function.\parBoth divisions,equation~(\ref{eqn:z1}) andequation~(\ref{eqn:z6}),irritate us by requiring us to specify a parameter,but for the latter, the parameter has a clear meaning.In the latter case we smooth a spectrum with a smoothingwindow of width, say $\Delta\omega$which this corresponds inversely to a time interval over which we smooth.Choosing a numerical value for $\epsilon$ has not such a simple interpretation.\inputdir{antoine}\parWe jump from simple mathematical theorizingtowards a genuine practical application when I grab some real data,a function of time and space from another textbook.Let us call this data $f(t,x)$ and its 2-D Fourier transform$F(\omega, k_x)$.The data and its autocorrelation are in Figure~\ref{fig:antoine10}.\par\plot{antoine10}{width=6in,height=8.0in}{ 2-D data (right) and a quadrant of its autocorrelation (left). Notice the longest nonzero time lag on the data is about 5.5 sec which is the latest nonzero signal on the autocorrelation.}\parThe autocorrelation $a(t,x)$ of $f(t,x)$ isthe inverse 2-D Fourier Transform of $ \overline{F(\omega, k_x)} F(\omega, k_x)$.Autocorrelations $a(x,y)$satisfy the symmetry relation$a(x,y)=a(-x,-y)$.Figure~\ref{fig:antoine11}shows only the interesting quadrant of the two independent quadrants.We see the autocorrelation of a 2-D function has someresemblance to the function itself but differs in important ways.\parInstead of messing with two different functions $X$ and $Y$ to divide,let us divide $F$ by itself.This sounds like $1=F/F$ but we willwatch what happens when we do the division carefullyavoiding zero division in the ways we usually do.\parFigure~\ref{fig:antoine11} showswhat happens with\begin{equation}\label{eqn:z7}1 \eq F/F \quad\approx\quad { { \overline{F} F }\over{ \overline{F} F + \epsilon^2 }}\end{equation}and with\begin{equation}\label{eqn:z8}1 \eq F/F \quad\approx\quad { { \overline{F} F }\over{ \langle \overline{F} F \rangle }}\end{equation}From Figure~\ref{fig:antoine11} we notice that both methods ofavoiding zero division give similar results.By playing with the $\epsilon$ and the smoothing widththe pictures could be made even more similar.My preference, however, is the smoothing.It is difficult to make physical sense of choosing a numerical valuefor $\epsilon$.It is much easier to make physical sense of choosing a smoothing window.The smoothing window is in $(\omega,k_x)$ space,but Fourier transformation tells us its effect in $(t,x)$ space.\plot{antoine11}{width=6in,height=8.0in}{ Equation~(\ref{eqn:z7}) (left) and equation~(\ref{eqn:z8}) (right). Both ways of dividing by zero give similar results.}\subsection{Imaging}The example of dividing a function by itself $(1=F/F)$ might notseem to make much sense, but it is very closely related to estimationoften encounted in imaging applications.It's not my purpose here to give a lecture on imaging theory, buthere is an overbrief explanation.\parImagine a downgoing wavefield $D(\omega,x,z)$ and scatterer thatfrom the downgoing wavefield creates an upgoing wavefield $U(\omega,x,z)$.Given $U$ and $D$, if there is a stong temporal correlation between themat any $(x,z)$ it likely means there is a reflector nearby that ismanufacturing $U$ from $D$.This reflectivity could be quantified by $U/D$.At the earth's surface the surface boundary condition says something like$U=D$ or $U=-D$. Thus at the surface we have something like $F/F$.As we go down in the earth, the main difference is that $U$ and $D$ gettime shifted in opposite directions, so $U$ and $D$ are similar butfor that time difference. Thus, a study of how we handle $F/F$ is worthwhile.\subsection{Formal path to the low-cut filter}This book defines many geophysical estimation problems.Many of them amount to statement of two goals.The first goal is a data fitting goal,the goal that the model should imply some observed data.The second goal is that the model be not too big or too wiggly.We will state these goals as two residuals, each of which is ideally zero.A very simple data fitting goal would be thatthe model $m$ equals the data $d$,thus the difference should vanish, say $0\approx m- d$.A more interesting goal is that the model should match the dataespecially at high frequencies but not necessarily at low frequencies.\begin{equation}0 \quad\approx\quad -i\omega(m - d)\end{equation}A danger of this goal is that the model could have a zero-frequency componentof infinite magnitude as well as large amplitudes for low frequencies.To suppress this, we need the second goal, a model residualwhich is to be minimized. We need a small number $\epsilon$.The model goal is\begin{equation}0 \quad\approx\quad \epsilon \ m\end{equation}To see the consequence of these two goals,we add the squares of the residuals\begin{equation} Q(m) \eq \omega^2 (m-d)^2 + \epsilon^2 m^2\end{equation}and then we minimize $Q(m)$ by setting its derivative to zero\begin{equation}0\eq {dQ\over dm} \eq 2 \omega^2 (m-d) + 2\epsilon^2 m\end{equation}or\begin{equation}m \eq {\omega^2 \over \omega^2+ \epsilon^2}\ d\label{eqn:lowcut}\end{equation}which is a low-cut filterwith a cutoff frequency of $\omega_0=\epsilon$.%we found less formally earlier, equation (\ref{ajt/eqn:locut}).\parOf some curiosity and significance is the numerical choice of $\epsilon$.The general theory says we need an epsilon,but does not say how much.For now let us simply rename $\epsilon=\omega_0$and think of it as a ``cut off frequency''.%Our low-pass filter approach in Chapter \ref{ajt/paper:ajt}%made it quite clear that $\epsilon$ is a filter cutoff%which might better have been named $\omega_0$.%We experimented with some objective tests%for the correct value of $\omega_0$,%a subject that we will return to later.% %Figure~\ref{fig:antoine11}.%\activeplot{antoine11}{width=6in,height=8.5in}{ER}{% Smoothing the denominator spectrum. Update makefile.% } \section{MULTIVARIATE LEAST SQUARES}\subsection{Inside an abstract vector}In engineering uses,a vector has three scalar components thatcorrespond to the three dimensions of the space in which we live.In least-squares data analysis, a vector is a one-dimensional arraythat can contain many different things.Such an array is an ``\bx{abstract vector}.''For example, in earthquake studies,the vector might contain the timean earthquake began, as well as its latitude, longitude, and depth.Alternatively, the abstract vectormight contain as many components as there are seismometers,and each component might be the arrival time of an earthquake wave.Used in signal analysis,the vector might contain the values of a signalat successive instants in time or,alternatively, a collection of signals.These signals might be ``\bx{multiplex}ed'' (interlaced)or ``demultiplexed'' (all of each signal preceding the next).
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -