📄 mf_cross.hlp
字号:
{smcl}
{* 25mar2005}{...}
{cmd:help mata cross()}
{hline}
{* index cross()}{...}
{* index cross product}{...}
{* index product}{...}
{title:Title}
{p 4 8 2}
{bf:[M-5] cross() -- Cross products}
{title:Syntax}
{p 8 12 2}
{it:real matrix}
{cmd:cross(}{it:X}{cmd:,}
{it:Z}{cmd:)}
{p 8 12 2}
{it:real matrix}
{cmd:cross(}{it:X}{cmd:,}
{it:w}{cmd:,}
{it:Z}{cmd:)}
{p 8 12 2}
{it:real matrix}
{cmd:cross(}{it:X}{cmd:,}
{it:xc}{cmd:,}
{it:Z}{cmd:,}
{it:zc}{cmd:)}
{p 8 12 2}
{it:real matrix}
{cmd:cross(}{it:X}{cmd:,}
{it:xc}{cmd:,}
{it:w}{cmd:,}
{it:Z}{cmd:,}
{it:zc}{cmd:)}
{p 4 8 2}
where
{it:X}: {it:real matrix X}
{it:xc}: {it:real scalar xc}
{it:w}: {it:real vector w}
{it:Z}: {it:real matrix Z}
{it:zc}: {it:real scalar zc}
{title:Description}
{p 4 4 2}
{cmd:cross()} makes calculations of the
form
{it:X}'{it:X}
{it:X}'{it:Z}
{it:X}{bf:'}diag({it:w}){it:X}
{it:X}{bf:'}diag({it:w}){it:Z}
{p 4 4 2}
{cmd:cross()} is designed for making calculations
that often arise in statistical formulas.
In one sense,
{cmd:cross()} does nothing that you cannot easily write out
in standard matrix notation. For instance,
{cmd:cross(}{it:X}{cmd:,}
{it:Z}{cmd:)}
calculates {it:X}'{it:Z}.
{cmd:cross()}, however, has the following differences and
advantages over the standard matrix-notation approach:
{p 8 12 2}
1. {cmd:cross()} omits the rows in {it:X} and {it:Z}
that contain missing values, which amounts to dropping observations with
missing values.
{p 8 12 2}
2. {cmd:cross()} uses less memory and is especially efficient
when used with views.
{p 8 12 2}
3. {cmd:cross()} watches for special cases and makes calculations
in those special cases more efficiently. For instance, if you code
{bind:{cmd:cross(}{it:X}{cmd:,} {it:X}{cmd:)}}, {cmd:cross()}
observes that the two matrices are the same and makes the calculation
for a symmetric matrix result.
{p 4 4 2}
{cmd:cross(}{it:X}{cmd:,}
{it:Z}{cmd:)}
returns {it:X}'{it:Z}.
Usually
rows({it:X})==rows({it:Z}), but {it:X} is
also allowed to be a scalar,
which is then treated as if
J(rows({it:Z}), 1, 1) were specified. Thus
{cmd:cross(1,} {it:Z}{cmd:)} is equivalent to
{cmd:colsum(}{it:Z}{cmd:)}.
{p 4 4 2}
{cmd:cross(}{it:X}{cmd:,}
{it:w}{cmd:,}
{it:Z}{cmd:)}
returns {it:X}{bf:'}diag({it:w}){it:Z}.
Usually, rows({it:w})==rows({it:Z})
or cols({it:w})==rows({it:Z}), but {it:w}
is also allowed to be a scalar, which is treated as
if
J(rows({it:Z}), 1, {it:w}) were specified. Thus
{cmd:cross(}{it:X}{cmd:,1,}{it:Z}{cmd:)}
is the same as {cmd:cross(}{it:X}{cmd:,}{it:Z}{cmd:)}.
{it:Z} may also be a scalar, just as in the two-argument case.
{p 4 4 2}
{cmd:cross(}{it:X}{cmd:,}
{it:xc}{cmd:,}
{it:Z}{cmd:,}
{it:zc}{cmd:)}
is similar to
{cmd:cross(}{it:X}{cmd:,}
{it:Z}{cmd:)} in that
{it:X}'{it:Z} is returned.
In the four-argument case, however, {it:X} is augmented on the
right with a column of
1s if {it:xc}!=0 and {it:Z} is similarly augmented if {it:zc}!=0.
{cmd:cross(}{it:X}{cmd:,}
{cmd:0,}
{it:Z}{cmd:,}
{cmd:0)}
is equivalent to
{cmd:cross(}{it:X}{cmd:,}
{it:Z}{cmd:)}. {it:Z} may be specified as a scalar.
{p 4 4 2}
{cmd:cross(}{it:X}{cmd:,}
{it:xc}{cmd:,}
{it:w}{cmd:,}
{it:Z}{cmd:,}
{it:zc}{cmd:)}
is similar to
{cmd:cross(}{it:X}{cmd:,}
{it:w}{cmd:,}
{it:Z}{cmd:)}
in that
{it:X}{bf:'}diag({it:w}){it:Z} is returned.
As with the four-argument {cmd:cross()},
{it:X} is augmented on the right with a column of
1s if {it:xc}!=0 and {it:Z} is similarly augmented if {it:zc}!=0.
Both {it:Z} and {it:w} may be specified as scalars.
{cmd:cross(}{it:X}{cmd:,}
{cmd:0,}
{cmd:1,}
{it:Z}{cmd:,}
{cmd:0)}
is equivalent to
{cmd:cross(}{it:X}{cmd:,}
{it:Z}{cmd:)}.
{title:Remarks}
{p 4 4 2}
In the following examples, we are going to calculate linear regression
coefficients using {it:b} = ({it:X}'{it:X})^(-1){it:X}'{it:y}, means using
Sum({it:x})/{it:n}, and variances using (Sum({it:x}^2)/{it:n} -
{it:mean}^2)*{it:n}/({it:n}-1).
See {bf:{help mf_crossdev:[M-5] crossdev()}} for examples of the
same calculations made in a more numerically stable way.
{p 4 4 2}
The examples use the automobile data. Since we are using the absolute form
of the calculation equations, it would be better if all variables had
values near 1 (in which case the absolute form of the calculation equations
are perfectly adequate). Thus we suggest
{cmd:. use auto}
{cmd:. replace weight = weight/1000}
{p 4 4 2}
Some of the examples use a weight {cmd:w}. For that, you might try
{cmd:. gen w = int(4*uniform()+1}
{title:Example 1: Linear regression, the traditional way}
{cmd}: y = X = .
: st_view(y, ., "mpg")
: st_view(X, ., ("weight", "foreign"))
:
: X = X, J(rows(X),1,1)
: b = invsym(X'X)*X'y{txt}
{p 4 4 2}
{it:Comments:}
Does not handle missing values and uses lots of memory if {cmd:X} is large.
{title:Example 2: Linear regression using cross()}
{cmd}: y = X = .
: st_view(y, ., "mpg")
: st_view(X, ., ("weight", "foreign"))
:
: XX = cross(X,1 , X,1)
: Xy = cross(X,1 , y,0)
: b = invsym(XX)*Xy{txt}
{p 4 4 2}
{it:Comments:}
There is still an issue with missing values; mpg might not be missing
everywhere weight and foreign are missing.
{title:Example 3: Linear regression using cross() and a single view}
{cmd}: // We will form
: //
: // (y X)'(y X) = (y'y, y'X \ X'y, X'X)
:
: M = .
: st_view(M, ., ("mpg", "weight", "foreign"), 0)
:
: CP = cross(M,1 , M,1)
: XX = CP[|2,2 \ .,.|]
: Xy = CP[|2,1 \ .,1|]
: b = invsym(XX)*Xy{txt}
{p 4 4 2}
{it:Comments:}
Using a single view handles all missing-value issues (note that we
specified fourth argument 0 to {cmd:st_view()}; see
{bf:{help mf_st_view:[M-5] st_view()}}).
{title:Example 4: Linear regression using cross() and subviews}
{cmd}: M = X = y = .
: st_view(M, ., ("mpg", "weight", "foreign"), 0)
: st_subview(y, M, ., 1)
: st_subview(X, M, ., (2\.))
:
: XX = cross(X,1 , X,1)
: Xy = cross(X,1 , y,0)
: b = invsym(XX)*Xy{txt}
{p 4 4 2}
{it:Comments:}
Using subviews also handles all missing-value issues; see
{bf:{help mf_st_subview:[M-5] st_subview()}}.
The subview approach is a little less efficient than the previous
solution but is perhaps easier to understand.
The efficiency issue concerns only the
extra memory required by the subviews {cmd:y} and {cmd:X}, which is not
much.
{p 4 4 2}
Also note that this subview solution could be used to handle the
missing-value problems of calculating linear regression coefficients
the traditional way, shown in example 1:
{cmd}: M = X = y = .
: st_view(M, ., ("mpg", "weight", "foreign"), 0)
: st_subview(y, M, ., 1)
: st_subview(X, M, ., (2\.))
:
: X = X, J(rows(X), 1, 1)
: b = invsym(X'X)*X'y
{title:Example 5: Weighted linear regression, the traditional way}
{cmd}: M = w = y = X = .
: st_view(M, ., ("w", "mpg", "weight", "foreign"), 0)
: st_subview(w, M, ., 1)
: st_subview(y, M, ., 2)
: st_subview(X, M, ., (3\.))
:
: X = X, J(rows(X, 1, 1)
: b = invsym(X'diag(w)*X)*X'diag(w)'y{txt}
{p 4 4 2}
{it:Comments:}
The memory requirements are now truly impressive because
{cmd:diag(w)} is an {it:N} {it:x} {it:N} matrix! That is, the memory
requirements are truly
impressive when {it:N} is large. Part of the power of Mata is that
you can write things like {cmd:invsym(X'diag(w)*X)*X'diag(w)'y}
and obtain solutions.
We do not mean to be dismissive of the traditional approach; we merely
wish to emphasize its memory requirements and note that there are
alternatives.
{title:Example 6: Weighted linear regression using cross()}
{cmd}: M = w = y = X = .
: st_view(M, ., ("w", "mpg", "weight", "foreign"), 0)
: st_subview(w, M, ., 1)
: st_subview(y, M, ., 2)
: st_subview(X, M, ., (3\.))
:
: XX = cross(X,1 ,w, X,1)
: Xy = cross(X,1 ,w, y,0)
: b = invsym(XX)*Xy{txt}
{p 4 4 2}
{it:Comments:}
The memory requirements here are no greater than they were in example 4,
which this example closely mirrors. Alternatively, we could have mirrored
the logic of example 3:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -