📄 ca.hlp
字号:
{phang}{cmd:nocolpoints}
suppresses the table with column point (category) statistics.
{phang}{opt compact}
specifies that the table with point statistics be displayed multiplied by
1000, enabling the display of more columns without wrapping output. The
compact tables can be displayed without wrapping for models with two
dimensions at linesize 79 and with three dimensions at linesize 99.
{phang}{opt plot}
displays a plot of the row and column coordinates in two dimensions. With
row principal normalization, only the row points are plotted. With column
principal normalization, only the column points are plotted. In the other
normalizations, both row and column points are plotted. You can use
{helpb cabiplot} directly if you need another selection of points to be
plotted, or if you you want to otherwise refine the plot.
{phang}{opt maxlength(#)}
specifies the maximum number of characters for labels.
The default is {cmd:maxlength(12)}.
{title:Remarks}
{title:Normalization and the interpretation of CA}
{pstd}
The normalization method used in the CA determines whether and how the
similarity of the row categories, the similarity of the column categories, and
the relationship (association) between the row and column variables can be
interpreted in terms of the row and column coordinates and the origin of the
plot.
{pstd}
How does one "compare row points"{hline 2}provided that the normalization
method allows such a comparison? Formally, the Euclidean distance between the
row points approximates the chi-squared distances between the corresponding
row profiles. Thus, in the biplot, row categories mapped close together have
similar row profiles, i.e., the distributions on the column variable are
similar. Row categories mapped widely apart have dissimilar row profiles.
Moreover, the Euclidean distance between a row point and the origin
approximates the chi-squared distance from the row profile and the row
centroid, and so indicates how different a category is from the population.
{pstd}
An analogous interpretation applies to column points.
{pstd}
For the association between the row and column variables: In the CA
biplot, one should not interpret the distance between a row point r and a
column point c as the relationship of r and c. Instead, think in terms
of the vectors origin-to-r (OR) and origin-to-c (OC).
Remember that CA decomposes scaled deviations d(r,c) from
independence, and d(r,c) is approximated by the inner product of OR
and OC. The larger the absolute value of d(r,c), the stronger the
association between r and c. In geometric terms d(r,c) can be written
as the product of the length of OR, the length of OC, and the
cosinus of the angle between OR and OC.
{pstd}
What does this mean? First, consider the effects of the angle. The
association in (r,c) is strongly positive if OR and OC point
in roughly the same direction; the frequency of (r,c) is much higher than
expected under independence, and so r tends to flock together with c. Note
that this is the case if the points r and c are close together.
Similarly, the association is strongly negative if OR and OC
point in opposite directions. In this case, the frequency of (r,c) is much
lower than expected under independence, and so r and c are unlikely to
occur simultaneously. Finally, if OR and OC are roughly
orthogonal (angle = +- 90), the deviation from independence is small.
{pstd}
Second, the association of r and c increases with the lengths of
OR and OC. Points far away from the origin tend to have large
associations. If a category is mapped close to the origin, all its
associations with categories of the other variable are small; in other words,
its distribution resembles the marginal distribution.
{pstd}
Here are the interpretations enabled by the main normalization methods as
specified in the {cmd:normalize()} option.
{hline 54}
similarity similarity association
method row cat. column cat. row vs column
{hline 54}
{opt symmetric} no no yes
{opt principal} yes yes no
{opt row} yes no yes
{opt column} no yes yes
{hline 54}
{pstd}
If we say that a comparison between row categories or between column
categories is not possible, we really mean to say that the chi-squared
distance between row profiles or column profiles is actually approximated by a
weighted Euclidean distance between the respective plots in which the weights
depend on the inertia of the dimensions, rather than the standard Euclidean
distance.
{pstd}
You may want to do a CA in principal normalization to study the
relationship between the categories of a variable, and a CA in
symmetric normalization to study the inter-relation of the row and column
categories.
{title:Example with ca}
{pstd}
{cmd:ca} creates the twoway frequency table from individual level data and
performs a correspondence analysis (CA) of this table.
{phang2}{cmd:. ca rank smoking}{p_end}
{phang2}{cmd:. ca rank smoking, dim(3)}{p_end}
{phang2}{cmd:. bysort gender: ca rank smoking , plot}{p_end}
{pstd}
We want to include the distribution of smoking, estimated in a national
sample, in the analysis. This is called a supplementary row. The data for
supplementary points are entered as a row matrix with one row and 4 columns,
one for each smoking category:
{phang2}{cmd:. matrix SR = (42, 29, 20, 9)}{p_end}
{phang2}{cmd:. matrix rownames SuppRow = national}{p_end}
{phang2}{cmd:. ca rank smoking, rowsupp(SR) plot}{p_end}
{title:Example with camat}
{pstd}
To conduct a correspondence analysis of data in tabular format it is
convenient to store the data in a Stata matrix and to use {cmd:camat} instead
of {cmd:ca}. Consider this table.
{center:{txt}{hline 16}{c TT}{hline 31}}
{center:{txt} {c |} smoking }
{center:{txt} personnel {c |} none light medium heavy}
{center:{hline 16}{c +}{hline 31}}
{center:{txt} senior manager {c |} {res}4 2 3 2}
{center:{txt} junior manager {c |} {res}4 3 7 4}
{center:{txt}senior employee {c |} {res}25 10 12 4}
{center:{txt}junior employee {c |} {res}18 24 33 13}
{center:{txt} secretary {c |} {res}10 6 7 2}
{center:{txt}{hline 16}{c BT}{hline 31}}
{pstd}
The following code creates a Stata matrix {cmd:F} with the frequencies, and
with the appropriate row and column names. Note that the row labels are
abbreviated, and spaces are replaced by underscores.
{phang2}
{cmd:. matrix F = ( 4,2,3,2 \ 4,3,7,4 \ 25,10,12,4 \ 18,24,33,13 \ 10,6,7,2 )}
{p_end}
{phang2}
{cmd:. matrix colnames F = none light medium heavy}
{p_end}
{phang2}
{cmd:. matrix rownames F = sen_mngr jun_mngr sen_empl jun_employ secr}
{p_end}
{pstd}
To conduct the CA with 2 dimensions (the default) and produce a plot, invoke
{cmd:camat} on {cmd:F}.
{phang2}{cmd:. camat F, rowname(rank) colname(smoking) plot}{p_end}
{pstd}
We add two supplementary columns with the distributions among drinking and
nondrinking subjects. We create a matrix with 5 rows (one for each staff
category) and 2 columns.
{phang2}{cmd:. matrix SC = ( 0,11 \ 1,17 \ 5,46 \ 10,78 \ 7,18)}{p_end}
{phang2}{cmd:. matrix colnames SC = nondrink drink}{p_end}
{phang2}{cmd:. camat F, rowsupp(SR) colsupp(SC) plot}{p_end}
{title:Also see}
{psee}
Manual: {bf:[MV] ca}
{p_end}
{psee}
Online: {help ca postestimation};{break}
{helpb biplot},
{helpb canon},
{helpb mds},
{helpb pca},
{helpb tabulate}
{p_end}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -