📄 paper.tex
字号:
When used in image analysis,the one-dimensional array might contain an image,which could itself be thought of as an array of signals.Vectors, including abstract vectors,are usually denoted by \bx{boldface letters} such as $\bold p$ and $\bold s$.Like physical vectors,abstract vectors are \bx{orthogonal}when their dot product vanishes: $\bold p \cdot \bold s =0$.Orthogonal vectors are well known in physical space;we will also encounter them in abstract vector space.\parWe consider first a hypothetical applicationwith one data vector $\dd$ and twofitting vectors $\bold f_1$ and $\bold f_2$.Each fitting vector is also known as a ``\bx{regressor}."Our first task is to approximate the data vector $\dd$by a scaled combination of the two regressor vectors.The scale factors $x_1$ and $x_2$should be chosen so that the model matches the data; i.e.,\begin{equation} \dd \quad \approx \quad \bold f_1 x_1 + \bold f_2 x_2 \label{eqn:wish1}\end{equation}\parNotice that we could take the partial derivativeof the data in (\ref{eqn:wish1}) with respect to an unknown,say $x_1$,and the result is the regressor $\bold f_1$.\par\boxit{ The \bx{partial derivative} of all theoretical data with respect to any model parameter gives a \bx{regressor}. A \bx{regressor} is a column in the matrix of partial-derivatives, $\partial d_i /\partial m_j$. }\parThe fitting goal (\ref{eqn:wish1}) is often expressed in the more compactmathematical matrix notation $\dd \approx \bold F \bold x $,but in our derivation herewe will keep track of each component explicitlyand use mathematical matrix notation to summarize the final result.Fitting the observed data $\bold d = \bold d^{\rm obs}$to its two theoretical parts $\bold f_1x_1$ and $\bold f_2x_2$can be expressedas minimizing the length of the residual vector $\bold r$, where\begin{eqnarray} \bold 0 \quad\approx\quad \bold r &=& \bold d^{\rm theor} - \bold d^{\rm obs} \\ \bold 0 \quad\approx\quad \bold r &=& \bold f_1 x_1 + \bold f_2 x_2 \ -\ \dd \label{eqn:resdef}\end{eqnarray}We use a dot product to construct a sum of squares (also called a ``\bx{quadratic form}")of the components of the residual vector:\begin{eqnarray}Q(x_1,x_2) &=& \bold r \cdot \bold r \\ &=& (\ff_1 x_1 + \ff_2 x_2 - \dd ) \cdot (\ff_1 x_1 + \ff_2 x_2 - \dd )\end{eqnarray}To find the gradient of the quadratic form $Q(x_1,x_2)$,you might be tempted to expand out the dot product into all nine termsand then differentiate.It is less cluttered, however, to remember the product rule, that\begin{equation}{d\over dx} \bold r \cdot \bold r\eq{d\bold r \over dx} \cdot \bold r+\bold r\cdot{d\bold r \over dx}\end{equation}Thus, the gradient of $ Q(x_1,x_2)$ is defined by its two components:\begin{eqnarray} {\partial Q \over \partial x_1} &= & \ff_1 \cdot (\ff_1 x_1 + \ff_2 x_2 -\dd ) + (\ff_1 x_1 + \ff_2 x_2 -\dd ) \cdot \ff_1 \\ {\partial Q \over \partial x_2} &= & \ff_2 \cdot (\ff_1 x_1 + \ff_2 x_2 -\dd ) + (\ff_1 x_1 + \ff_2 x_2 -\dd ) \cdot \ff_2\end{eqnarray}Setting these derivatives to zero and using$(\ff_1 \cdot \ff_2)=(\ff_2 \cdot \ff_1)$ etc.,we get\begin{eqnarray}(\ff_1 \cdot \dd ) &= & (\ff_1 \cdot \ff_1) x_1 + (\ff_1 \cdot \ff_2) x_2 \\(\ff_2 \cdot \dd ) &= & (\ff_2 \cdot \ff_1) x_1 + (\ff_2 \cdot \ff_2) x_2\end{eqnarray}We can use these two equations to solve forthe two unknowns $x_1$ and $x_2$.Writing this expression in matrix notation, we have\begin{equation}\left[ \begin{array}{c} (\ff_1 \cdot \dd ) \\ (\ff_2 \cdot \dd ) \end{array} \right] \eq \left[ \begin{array}{ccc} (\ff_1 \cdot \ff_1) & (\ff_1 \cdot \ff_2) \\ (\ff_2 \cdot \ff_1) & (\ff_2 \cdot \ff_2) \end{array} \right] \; \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] \label{eqn:2by2}\end{equation}It is customary to use matrix notation without dot products.To do this, we need some additional definitions.To clarify these definitions,we inspect vectors $\ff_1$, $\ff_2$, and $\dd$ of three components.Thus \begin{equation}\bold F \eq [ \ff_1 \quad \ff_2 ] \eq \left[ \begin{array}{ccc} f_{11} & f_{12} \\ f_{21} & f_{22} \\ f_{31} & f_{32} \end{array} \right] \end{equation}Likewise, the {\it transposed} matrix $\bold F'$ is defined by\begin{equation}\bold F' \eq\left[ \begin{array}{ccc} f_{11} & f_{21} & f_{31} \\ f_{12} & f_{22} & f_{32} \end{array} \right] \end{equation}The matrix in equation (\ref{eqn:2by2})contains dot products.Matrix multiplication is an abstract way of representing the dot products:\begin{equation}\left[ \begin{array}{ccc} (\ff_1 \cdot \ff_1) & (\ff_1 \cdot \ff_2) \\ (\ff_2 \cdot \ff_1) & (\ff_2 \cdot \ff_2) \end{array} \right] \eq\left[ \begin{array}{ccc} f_{11} & f_{21} & f_{31} \\ f_{12} & f_{22} & f_{32} \end{array} \right] \left[ \begin{array}{ccc} f_{11} & f_{12} \\ f_{21} & f_{22} \\ f_{31} & f_{32} \end{array} \right] \label{eqn:covmat}\end{equation}Thus, equation (\ref{eqn:2by2}) without dot products is\begin{equation}\left[ \begin{array}{ccc} f_{11} & f_{21} & f_{31} \\ f_{12} & f_{22} & f_{32} \end{array} \right] \; \left[ \begin{array}{c} d_1 \\ d_2 \\ d_3 \end{array} \right]\eq\left[ \begin{array}{ccc} f_{11} & f_{21} & f_{31} \\ f_{12} & f_{22} & f_{32} \end{array} \right] \left[ \begin{array}{ccc} f_{11} & f_{12} \\ f_{21} & f_{22} \\ f_{31} & f_{32} \end{array} \right] \left[ \begin{array}{ccc} x_1 \\ x_2 \end{array} \right] \end{equation}which has the matrix abbreviation\begin{equation}\bold F' \dd \eq ( \bold F' \; \bold F ) \bold x\label{eqn:analytic}\end{equation}Equation(\ref{eqn:analytic})is the classic result of least-squaresfitting of data to a collection of regressors.Obviously, the same matrix form applies when there are more thantwo regressors and each vector has more than three components.Equation(\ref{eqn:analytic})leads to an \bx{analytic solution} for $\bold x$using an inverse matrix.To solve formally for the unknown $\bold x$,we premultiply by the inverse matrix $( \bold F' \; \bold F )^{-1}$:\par\boxit{\begin{equation}\bold x \eq( \bold F' \; \bold F )^{-1} \;\bold F' \dd \label{eqn:sln}\end{equation}Equation (\ref{eqn:sln}) is the central result of\bxbx{least-squares}{least squares, central equation of}theory.We see it everywhere.}\parIn our first manipulation of matrix algebra,we move around some parentheses in (\ref{eqn:analytic}):\begin{equation}\bold F' \dd \eq \bold F' \; (\bold F \bold x )\label{eqn:comp}\end{equation}Moving the parentheses implies a regrouping of termsor a reordering of a computation.You can verify the validity of moving the parenthesesif you write (\ref{eqn:comp}) in full as the set of two equations it represents.Equation(\ref{eqn:analytic})led to the ``analytic'' solution (\ref{eqn:sln}).In a later section on conjugate directions,we will see that equation(\ref{eqn:comp})expresses better than(\ref{eqn:sln})the philosophy of iterative methods.\parNotice how equation(\ref{eqn:comp})invites us to cancel the matrix$\bold F'$from each side.We cannot do that of course, because$\bold F'$is not a number, nor is it a square matrix with an inverse.If you really want to cancel the matrix $\bold F'$, you may,but the equation is then only an approximationthat restates our original goal (\ref{eqn:wish1}):\begin{equation}\dd \quad \approx \quad \bold F \bold x \label{eqn:wish2}\end{equation}\parA speedy problem solver mightignore the mathematics covering the previous page,study his or her application until he or sheis able to write the \bx{statement of goals} \sx{goals, statement of}(\ref{eqn:wish2}) = (\ref{eqn:wish1}),premultiply by $\bold F'$,replace $\approx$ by =,getting~(\ref{eqn:analytic}),and take(\ref{eqn:analytic})to a simultaneous equation-solving program to get $\bold x$.\parWhat I call ``\bx{fitting goals}'' are called``\bx{regressions}'' by statisticians.In common language the word regression meansto ``trend toward a more primitive perfect state''which vaguely resembles reducing the size of (energy in)the residual $\bold r = \bold F \bold x - \bold d $.Formally this is often written as:\begin{equation}\min_{\bold x} \ \ || \bold F \bold x - \bold d || \label{eqn:norm}\end{equation}The notation above with two pairs of vertical lineslooks like double absolute value,but we can understand it as a reminder to square and sum all the components.This formal notation is more explicitabout what is constant and what is variable during the fitting.\subsection{Normal equations}\parAn important concept is that when energy is minimum,the residual is orthogonal to the fitting functions.The fitting functions are the column vectors$\bold f_1$, $\bold f_2$, and $\bold f_3$.Let us verify only that the dot product $ \bold r \cdot \bold f_2 $ vanishes;to do this, we'll showthat those two vectors are orthogonal.Energy minimum is found by\begin{equation}0 \quad = \quad {\partial\over \partial x_2}\ \bold r \cdot \bold r \quad = \quad 2\; \bold r \cdot {\partial \bold r\over \partial x_2} \quad = \quad 2\; \bold r \cdot \bold f_2\label{eqn:orthogtofit}\end{equation}(To compute the derivative refer to equation (\ref{eqn:resdef}).)Equation (\ref{eqn:orthogtofit}) shows thatthe residual is orthogonal to a fitting function.The fitting functions are the column vectors in the fitting matrix.\parThe basic least-squares equations are often calledthe ``\bx{normal}" equations.The word ``normal" means perpendicular.We can rewrite equation(\ref{eqn:comp})to emphasize the perpendicularity.Bring both terms to the left,and recall the definition of the residual $\bold r$from equation (\ref{eqn:resdef}):\begin{eqnarray}\label{eqn:fitorth1}\bold F' ( \bold F \bold x - \dd) &=& \bold 0 \\\bold F' \bold r &=& \bold 0 \label{eqn:fitorth2}\end{eqnarray}Equation (\ref{eqn:fitorth2}) says that the \bx{residual} vector $\bold r$is perpendicular toeach row in the $\bold F'$ matrix.These rows are the \bx{fitting function}s.Therefore, the residual, after it has been minimized,is perpendicular to{\it all}the fitting functions.%\begin{notforlecture}\subsection{Differentiation by a complex vector}\sx{differentiate by complex vector}\sx{complex vector}\sx{complex operator}\parComplex numbers frequently arise in physical problems,particularly those with Fourier series.Let us extend the multivariable least-squares theoryto the use of complex-valued unknowns $\bold x$.First recall how complex numbers were handledwith single-variable least squares;i.e.,~as in the discussion leading up to equation~(\ref{eqn:z5}).Use a prime, such as $\bold x'$, to denote the complex conjugateof the transposed vector $\bold x$.Now write the positive \bx{quadratic form} as\begin{equation}Q(\bold x', \bold x) \eq(\bold F\bold x - \bold d)'(\bold F\bold x - \bold d)\eq(\bold x' \bold F' - \bold d')(\bold F\bold x - \bold d)\label{eqn:6-1-23}\end{equation}After equation (\ref{eqn:z4}),we minimized a quadratic form $Q(\bar X,X)$by setting to zero both$\partial Q/\partial \bar X$ and $\partial Q/\partial X$.We noted that only one of$\partial Q/\partial \bar X$ and $\partial Q/\partial X$is necessarily zerobecause they are conjugates of each other.Now take the derivative of $Q$with respect to the (possibly complex, row) vector $\bold x'$.Notice that $\partial Q/\partial \bold x'$ is the complex conjugate transposeof $\partial Q/\partial \bold x$.Thus, setting one to zero sets the other also to zero.Setting $\partial Q/\partial \bold x' =\bold 0$ gives the normal equations:\begin{equation}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -