📄 pa 765 discriminant function analysis.mht
字号:
variables' contributions are significant. Wilks's lambda is =
sometimes called=20
<I>the U statistic</I>.=20
<P><B>Wilks's lambda </B>is also used in a second context of =
discriminant=20
analysis, to test the significance of the discriminant function as a =
whole.=20
<UL>
<P></P></UL>
<P></P></LI></UL>
<P><A name=3Dassoc></A></P>
<LI><B>Measuring strength of relationships</B>=20
<UL>
<P><A name=3Dtable></A>
<LI>The <B>classification table</B>, also called a confusion, =
assignment, or=20
prediction matrix or table, is used to assess the performance of DA. =
This is=20
simply a table in which the rows are the observed categories of the=20
dependent and the columns are the predicted categories of the =
dependents.=20
When prediction is perfect, all cases will lie on the diagonal. The=20
percentage of cases on the diagonal is the percentage of correct=20
classifications. This percentage is called the <B>hit ratio</B>.=20
<P><A name=3Ddsquare></A></P>
<LI><B>Mahalanobis D-Square</B> and <B>Rao's V</B> are two other =
indexes of=20
the extent to which the discriminant functions discriminate between=20
criterion groups.=20
<P><A name=3Dcanonic></A></P>
<LI><B>Canonical correlation, R<SUB>c</SUB>: </B>Squared canonical=20
correlation, R<SUB>c</SUB><SUP>2</SUP>, is the percent of variation =
in the=20
dependent discriminated by the set of independents in DA.=20
<P></P></LI></UL>
<P><A name=3Dinterp></A></P>
<LI><B>Interpreting the discriminant functions</B>=20
<UL>
<P><A name=3Dmatrix></A>
<LI>The <B>structure matrix table</B> shows the correlations of each =
variable with each discriminant function. These simple Pearsonian=20
correlations are called <B>structure coefficients or =
correlations</B> or=20
<B>discriminant loadings</B>. When the dependent has more than two=20
categories there will be more than one discriminant function. In =
that case,=20
there will be multiple columns in the table, one for each function. =
The=20
correlations then serve like factor loadings in factor analysis -- =
that is,=20
by identifying the largest absolute correlations associated with =
each=20
discriminant function the researcher gains insight into how to name =
each=20
function.=20
<P><A name=3Dstructure></A><I>Structure coefficients vs. =
standardized=20
discriminant function coefficients.</I> The standardized =
discriminant=20
function coefficients indicate the partial contribution of each =
variable to=20
the discriminant function(s), controlling for other independents =
entered in=20
the equation. The structure coefficients indicate the simple =
correlations=20
between the variables and the discriminant function or functions. =
The=20
structure coefficients should be used to assign meaningful labels to =
the=20
discriminant functions. The standardized discriminant function =
coefficients=20
should be used to assess each independent variable's unique =
contribution to=20
the discriminant function.=20
<P><A name=3DMahal></A></P>
<LI><B>Mahalanobis distances</B> are used in analyzing cases in =
discriminant=20
analysis. For instance, one might wish to analyze a new, unknown set =
of=20
cases in comparison to an existing set of known cases. Mahalanobis =
distance=20
is the distance between a case and the centroid for each group (of =
the=20
dependent) in attribute space (n-dimensional space defined by n =
variables).A=20
case will have one Mahalanobis distance for each group, and it will =
be=20
classified as belonging to the group for which its Mahalanobis =
distance is=20
smallest. Thus, the smaller the Mahalanobis distance, the closer the =
case is=20
to the group centroid and the more likely it is to be classed as =
belonging=20
to that group. Since Mahalanobis distance is measured in terms of =
standard=20
deviations from the centroid, therefore a case which is less than =
1.96=20
Mahalanobis distance units from the centroid has less than .05 =
chance of=20
belonging to the group represented by the centroid; 3 units would =
likewise=20
correspond to less than .01 chance. SPSS reports squared Mahalanobis =
distance. </LI></UL>
<P><A name=3Dassume></A></P>
<LI><B>Tests of Assumptions</B>=20
<P>
<UL><A name=3Dboxm></A>
<LI><B>Box's M</B> tests the null hypothesis that the covariance =
matrices do=20
not differ between groups formed by the dependent. This is an =
assumption of=20
discriminant analysis. The researcher wants this test <U>not</U> to =
be=20
significant, so as to accept the null hypothesis that the groups do =
not=20
differ. This test is very sensitive to meeting also the assumption =
of=20
multivariate normality. Note, though, that DA can be robust even =
when this=20
assumption is violated. </LI></UL>
<P><A name=3Dvalidity></A></P>
<LI><B>Validation</B>=20
<UL>
<P><A name=3Dholdout></A>
<LI>A <B>hold-out sample</B> is often used for validation of the=20
discriminant function. This is a split halves test, were a portion =
of the=20
cases are assigned to the <I>analysis sample</I> for purposes of =
training=20
the discriminant function, then it is validated by assessing its =
performance=20
on the remaining cases in the hold-out sample. </LI></UL>
<P></P></LI></UL>
<P><BR><A name=3DSPSS></A>
<H2>SPSS Output Examples</H2>
<UL>
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/discrim2.htm">Discriminan=
t=20
Function Analysis (two groups)</A>=20
<LI><A =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/discrim3.htm">Multiple=20
Discriminant Function Analysis (three groups)</A> </LI></UL>
<P><BR><A name=3Dassumptions></A>
<H2>Assumptions</H2>
<UL>
<LI>The dependent variable is a true dichotomy. When the range of a =
true=20
underlying continuous variable is constrained to form a dichotomy, =
correlation=20
is attenuated (biased toward underestimation). One should never =
dichotomize a=20
continuous variable simply for the purpose of applying discriminant =
function=20
analysis.=20
<P></P>
<LI>All cases must be independent and must belong to a group formed by =
the=20
dependent variable. The groups must be mutually exclusive, with every =
case=20
belonging to only one group.=20
<P></P>
<LI>Group sizes of the dependent are not grossly different.=20
<P></P>
<LI>There must be at least two cases for each category of the =
dependent.=20
<P></P>
<LI>The independent variable is or variables are interval. As with =
other=20
members of the regression family, dichotomies, dummy variables, and =
ordinal=20
variables with at least 5 categories are commonly used as well.=20
<P></P>
<LI>The maximum number of independent variables is n-2, where n is the =
sample=20
size.=20
<P></P>
<LI>No independents have a zero standard deviation in one or more of =
the=20
groups formed by the dependent.=20
<P></P>
<LI>Errors (residuals) are randomly distributed.=20
<P><A name=3Dhov1></A></P>
<LI>Homogeneity of variances (homoscedasticity): within each group =
formed by=20
the dependent, the variance of each interval independent should be =
similar=20
between groups. That is, the independents may (and will) have =
different=20
variances one from another, but for the same independent, the groups =
formed by=20
the dependent should have similar variances and means on that =
independent.=20
Discriminant analysis is highly sensitive to outliers. Lack of =
homogeneity of=20
variances may indicate the presence of outliers in one or more groups. =
<P><A name=3Dhoc1></A></P>
<LI>Homogeneity of covariances/correlations: within each group formed =
by the=20
dependent, the covariance/correlation between any two predictor =
variables=20
should be similar to the corresponding covariance/correlation in other =
groups.=20
That is, each group has a similar covariance/correlation matrix.=20
<P></P>
<LI>Absence of perfect multicollinearity. If one independents is very =
highly=20
correlated with another, or one is a function (ex., the sum) of other=20
independents, then the tolerance value for that variable will approach =
0 and=20
the matrix will not have a unique discriminant solution. Such a matrix =
is said=20
to be <I>ill-conditioned</I>. Tolerance is discussed in the section on =
<A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/regress.htm#toleranc">reg=
ression</A>.=20
<P></P>
<LI>Low multicollinearity of the independents. To the extent that =
independents=20
are correlated, the standardized discriminant function coefficients =
will not=20
reliably assess the relative importance of the predictor variables.=20
<P></P>
<LI>Assumes linearity (does not take into account exponential terms =
unless=20
such transformed variables are added as additional independents).=20
<P></P>
<LI>Assumes additivity (does not take into account interaction terms =
unless=20
new crossproduct variables are added as additional independents).=20
<P></P>
<LI>For purposes of significance testing, predictor variables follow=20
multivariate normal distributions. That is, each predictor variable =
has a=20
normal distribution about fixed values of all the other independents. =
</LI></UL>
<P><BR>
<H2>Frequently Asked Questions</H2>
<UL>
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/discrim.htm#cluster"><B>I=
sn't=20
discriminant analysis the same as cluster analysis?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant"><B>=
When=20
does the discriminant function have no constant term?</B></A>=20
<LI><A =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/discrim.htm#hov"><B>How=20
important is it that the assumptions of homogeneity of variances and =
of=20
multivariate normal distribution be met?</B></A>=20
<LI><A =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/discrim.htm#betas"><B>In =
DA, how can you assess the relative importance of the discriminating=20
variables?</B></A>=20
<LI><A =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/discrim.htm#mle"><B>What =
is the maximum likelihood estimation method in discriminant analysis =
(logistic=20
discriminate function analysis)?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/discrim.htm#fisher"><B>Wh=
at are=20
Fisher's linear discriminant functions? </B></A>
<LI><A =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/discrim.htm#step"><B>What=
=20
is stepwise DA?</B></A>=20
<LI><A =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/discrim.htm#mancova"><B>I=
=20
have heard DA is related to MANCOVA. How so?</B></A>=20
<P><BR><A name=3Dcluster></A></P>
<LI><B>Isn't discriminant analysis the same as cluster analysis?</B>=20
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -