📄 matrix_dissimilarity.hlp
字号:
{smcl}
{* 04apr2005}{...}
{cmd:help matrix dissimilarity}
{hline}
{title:Title}
{p2colset 5 33 35 2}{...}
{p2col:{hi:[P] matrix dissimilarity} {hline 2}}Compute similarity or
dissimilarity measures{p_end}
{p2colreset}{...}
{title:Syntax}
{p 8 29 2}
{cmdab:mat:rix} {cmdab:dis:similarity}
{it:matname} {cmd:=} [{varlist}]
{ifin}
{bind:[{cmd:,} {it:options}]}
{p2colset 5 23 25 2}{...}
{p2col:{it:options}}description{p_end}
{p2line}
{p2col:{it:{help measure option:measure}}}similarity or dissimilarity measure;
default is {cmd:L2} (Euclidean){p_end}
{p2col:{opt obs:ervations}}compute similarity or dissimilarities between
observations; default{p_end}
{p2col:{opt var:iables}}compute similarities or dissimilarities between
variables{p_end}
{p2col:{opth name:s(varname)}}row/column names for {it:matname} (allowed with
{opt observations}){p_end}
{p2col:{opt allb:inary}}check that all values are 0, 1, or missing{p_end}
{p2col:{opt prop:ortions}}interpret values as proportions of binary
values{p_end}
{p2col:{cmd:dissim(}{it:{help matrix_dissimilarity##method:method}{cmd:)}}}change
similarity measure to dissimilarity{p_end}
{p2line}
{pstd}
where {it:method} transforms similarities to dissimilarities using
{opt oneminus} d_ij = 1 - s_ij
{opt st:andard} d_ij = sqrt(s_ii + s_jj - 2*s_ij)
{title:Description}
{pstd}
{cmd:matrix dissimilarity} computes a similarity, dissimilarity, or distance
matrix. The similarity or dissimilarity between each observation (or variable
if the {cmd:variables} option is specified) and the others is placed in
{it:matname}. The element in the {it:i}th row and {it:j}th column gives
either the similarity or dissimilarity between the {it:i}th and {it:j}th
observation (or variable). Whether you get a similarity or a dissimilarity
depends upon the requested {it:measure}; see {it:{help measure_option}}.
{pstd}
If there are a large number of observations (variables when the
{cmd:variables} option is specified), you may need to increase the maximum
matrix size; see {help matsize}. If the number of observations (or
variables) is so large that storing the results in a matrix is not practical,
you may wish to consider using the {cmd:cluster measures} command, which stores
similarities or dissimilarities in variables; see {help cluster programming}.
{pstd}
When computing similarities or dissimilarities between observations, the
default row and column names of {it:matname} are set to {cmd:obs}{it:#},
where {it:#} is the observation number. The {cmd:names()} option allows you
to override this default. For similarities or dissimilarities between
variables, the row and column names of {it:matname} are set to the appropriate
variable names.
{pstd}
The order of the rows and columns corresponds with the order of your
observations, when you are computing similarities or dissimilarities between
observations. Warning: if you reorder your data (e.g., using {helpb sort} or
{helpb gsort}) after running {cmd:matrix dissimilarity}, the row and column
ordering will no longer match your data.
{title:Options}
{phang}
{it:measure} specifies one of the similarity or dissimilarity measures allowed
by Stata. The default is {cmd:L2}, Euclidean distance. Numerous
similarity and dissimilarity measures are provided for continuous data and
for binary data; see {it:{help measure_option}}.
{phang}
{cmd:observations} and {cmd:variables}
specify whether similarities or dissimilarities are computed between
observations or variables. The default is {cmd:observations}.
{phang}
{cmd:names(}{it:varname}{cmd:)}
provides row and column names for {it:matname}. {it:varname} must be a
string variable with a length of 32 or less. You will want to pick a
{it:varname} that yields unique values for the row and column names.
Uniqueness of values is not checked by {cmd:matrix dissimilarity}.
{cmd:names()} is not allowed with the {cmd:variables} option. The default
row and column names when the similarities or dissimilarities are computed
between observations is {cmd:obs}{it:#}, where {it:#} is the observation
number corresponding to that row or column.
{phang}
{cmd:allbinary}
checks that all values are 0, 1, or {help missing}. Stata treats nonzero
values as one (excluding missing values) when dealing with what are
supposed to be binary data (including binary similarity {it:measure}s).
{cmd:allbinary} causes {cmd:matrix dissimilarity} to exit with an error
message if the values are not truly binary. {cmd:allbinary} is not
allowed with {cmd:proportions}.
{phang}
{cmd:proportions}
is for use with binary similarity {it:measure}s. It indicates that values
are to be interpreted as proportions of binary values. The default action
treats all nonzero values as one (excluding missing values). With
{cmd:proportions}, the values are confirmed to be between zero and one
inclusive. See {it:{help measure_option}} for a discussion of the use of
proportions with binary {it:measure}s. {cmd:proportions} is not allowed
with {cmd:allbinary}.
{phang}
{opt dissim(method)}
specifies that similarity measures are to be transformed into
dissimilarity measures. {it:method} may be {cmd:oneminus} or
{cmd:standard}. {cmd:oneminus} transforms similarities to dissimilarities
using d_ij = 1-s_ij. {cmd:standard} uses d_ij = sqrt(s_ii+s_jj-2*s_ij).
{cmd:dissim()} does nothing when the {it:measure} is already a
dissimilarity or distance. See {it:{help measure_option}} to see which
{it:measure}s are similarities.
{title:Examples}
{pstd}
Place in matrix {cmd:D} the Euclidean distance (the default) between all the
observations (the default) for variables {cmd:x1}, {cmd:x2}, and {cmd:x3}.
{cmd:. mat dis D = x1 x2 x3}
{pstd}
Compute the matching coefficient similarity measure between the first five
observations for variables {cmd:c1} through {cmd:c35} placing the result in
matrix {cmd:m}. Verify that the data are truly binary, and give matrix
{cmd:m} row and column names from the string variable {cmd:id}.
{cmd:. mat dis m = c1-c35 in 1/5, matching allbinary names(id)}
{pstd}
Create matrix {cmd:canbvars} holding the Canberra distance between all the
variables. (Notice that no variables are listed after the {cmd:=}, defaulting
to all variables being included.)
{cmd:. mat dis canbvars = , Canberra variables}
{pstd}
Create matrix {cmd:canbobs} holding the Canberra distance between the
observations using all the variables.
{cmd:. mat dis canbobs = , Canberra}
{title:Also see}
{psee}
Manual: {bf:[P] matrix dissimilarity}
{psee}
Online: {it:{help measure_option}},
{helpb matrix};
{helpb cluster},
{help cluster programming},
{helpb clustermat},
{helpb mdsmat},
{helpb parse_dissim}
{p_end}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -