📄 pa 765 logistic regression.mht
字号:
regression model will reduce our errors in classifying the dependent =
by 80%=20
compared to classifying the dependent by always guessing a case is =
to be=20
classed the same as the most frequent category of the dichotomous =
dependent.=20
Lambda-p is an adjustment to classic lambda to assure that the =
coefficient=20
will be positive when the model helps and negative when, as is =
possible, the=20
model actually leads to worse predictions than simple guessing based =
on the=20
most frequent class. Lambda-p varies from 1 to (1 - N), where N is =
the=20
number of cases. Lambda-p =3D (f - e)/f, where f is the smallest row =
frequncy=20
(smallest row marginal in the classification table) and e is the =
number of=20
errors (the 1,0 and 0,1 cells in the classification table).=20
<P><A name=3Dtaup></A></P>
<LI><B>Tau-p</B> is an alternative measure of association. When the=20
classification table has equal marginal distributions, tau-p varies =
from -1=20
to +1, but otherwise may be less than 1. Negative values mean the =
logistic=20
model does worse than expected by chance. Tau-p can be lower than =
lambda-p=20
because it penalizes proportional reduction in error for non-random=20
distribution of errors (that is, it wants an equal number of errors =
in each=20
of the error quadrants in the table.)=20
<P><A name=3Dphip></A></P>
<LI><B>Phi-p</B> is a third alternative discussed by Menard (pp. =
29-30) but=20
is not part of SPSS output. Phi-p varies from -1 to +1 for tables =
with equal=20
marginal distributions.=20
<P><A name=3Dbinomial></A></P>
<LI><B>Binomial d </B>is a significance test for any of these =
measures of=20
association, though in each case the number of "errors" is defined=20
differently (see Menard, pp. 30-31).=20
<P><A name=3Dseparation></A></P>
<LI><B>Separation: </B>Note that when the independents completely =
predict=20
the dependent, the error quadrants in the classification table will =
contain=20
0's, which is called <I>complete separation</I>. When this is nearly =
the=20
case, as when the error quadrants have only one case, this is called =
<I>quasicomplete separation</I>. When separation occurs, one will =
get very=20
large logit coefficients with very high standard errors. While =
separation=20
may indicate powerful and valid prediction, often it is a sign of a =
problem=20
with the independents, such as definitional overlap between the =
indicators=20
for the independent and dependent variables. </LI></OL>
<P><A name=3Dcstat></A></P>
<LI>The <B>c statistic</B> is a measure of the discriminatory power of =
the=20
logistic equation. It varies from .5 (the model's predictions are no =
better=20
than chance) to 1.0 (the model always assigns higher probabilities to =
correct=20
cases than to incorrect cases). Thus c is the percent of all possible =
pairs of=20
cases in which the model assigns a higher probability to a correct =
case than=20
to an incorrect case. The c statistic is not part of SPSS output but =
may be=20
calculated using the COMPUTE facility, as described in the SPSS =
manual's=20
chapter on logistic regression.=20
<P><A name=3Dclassplot></A></P>
<LI>The <B>classplot or histogram of predicted probabilities</B>, also =
called=20
the "plot of observed groups and predicted probabilities," is part of =
SPSS=20
output, and is an alternative way of assessing correct and incorrect=20
predictions under logistic regression. The X axis is the predicted =
probability=20
from 0.0 to 1.0 of the dependent being classified "1". The Y axis is=20
frequency: the number of cases classified. Inside the plot are columns =
of=20
observed 1's and 0's. Thus a column with one "1" and five "0's" set at =
p =3D .25=20
would mean that six cases were predicted to be "1's" with a =
probability of=20
.25, and thus were classified as "0's." Of these, five actually were =
"0's" but=20
one (an error) was a "1" on the dependent variable. Examining this =
plot will=20
tell such things as how well the model classifies difficult cases =
(ones near p=20
=3D .5).=20
<P><A name=3DLL></A></P>
<LI><B>Log likelihood</B>: A "likelihood" is a probability, =
specifically the=20
probability that the observed values of the dependent may be predicted =
from=20
the observed values of the independents. Like any probability, the =
likelihood=20
varies from 0 to 1. The log likelihood (LL) is its log and varies from =
0 to=20
minus infinity (it is negative because the log of any number less than =
1 is=20
negative). LL is calculated through <I>iteration</I>, using a maximum=20
likelihood method.=20
<P>
<OL><A name=3Ddeviance></A>
<LI><B>Deviance</B>. Because -2LL has approximately a chi-square=20
distribution, -2LL can be used for assessing the significance of =
logistic=20
regression, analogous to the use of the sum of squared errors in OLS =
regression. The -2LL statistic is the "scaled deviance" statistic =
for=20
logistic regression and is also called "deviation chi-square," =
D<FONT=20
size=3D-2>M</FONT>, L-square, or "badness of fit." Deviance reflects =
error=20
associated with the model even after the independents are included =
in the=20
model. It thus has to do with the significance of the =
<U>unexplained</U>=20
variance in the dependent. One wants -2LL <U>not</U> to be =
significant. That=20
is, Significance(-2LL) should be worse than (greater than) .05. SPSS =
calls=20
this "-2 Log Likelihood" in the chi-square output column. There are =
a number=20
of variants:=20
<P><A name=3Dinitial></A></P>
<LI><B>Initial chi-square</B>, also called D<FONT size=3D-2>O</FONT> =
or=20
deviance for the null model, reflects the error associated with the =
model=20
when only the intercept is included in the model. D<FONT =
size=3D-2>O</FONT> is=20
-2LL for the model which includes only the intercept. That is, =
initial=20
chi-square is -2LL for the model which accepts the null hypothesis =
that all=20
the b coefficients are 0. SPSS calls this the "initial log =
likelihood=20
function -2 log likelihood".=20
<P><A name=3Dbadness></A>
<P><A name=3Dmodelchi></A></P>
<LI><B>Model chi-square</B> is also known as G<FONT =
size=3D-2>M</FONT>, Hosmer=20
and Lemeshow's G, -2LL<SUB>difference</SUB>, or just "goodness of =
fit."=20
Model chi-square functions as a significance test, like the F test =
of the=20
OLS regression model or the Likelihood Ratio (G<SUP>2</SUP>) test in =
<A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logit.htm#ratio">loglinea=
r=20
analysis</A>. Model chi-square provides the usual significance test =
for a=20
logistic model. Model chi-square tests the null hypothesis that =
<U>none</U>=20
of the independents are linearly related to the log odds of the =
dependent.=20
That is, model chi-square tests the null hypothesis that all =
population=20
logistic regression coefficients except the constant are zero. It is =
thus an=20
overall model test which does not assure that <U>every</U> =
independent is=20
significant.=20
<P>Model chi-square is computed as -2LL for the null (initial) model =
minus=20
-2LL for the researcher's model.The null model, also called the =
initial=20
model, is logit(p) =3D the constant. Degrees of freedom equal the =
number of=20
terms in the model minus the constant (this is the same as the =
difference in=20
the number of terms between the two models, since the null model has =
only=20
one term). Model chi-square measures the improvement in fit that the =
explanatory variables make compared to the null model. Note SPSS =
calls -2LL=20
for the null model "Initial Log Likelihood".=20
<P>Model chi-square is a likelihood ratio test which reflects the =
difference=20
between error not knowing the independents (initial chi-square) and =
error=20
when the independents are included in the model (deviance). Thus, =
model=20
chi-square =3D initial chi-square - deviance. Model chi-square =
follows a=20
chi-square distribution (unlike deviance) with degrees of freedom =
equal to=20
the difference in the number of parameters in the examined model =
compared to=20
the model with only the intercept. Model chi-square is the =
denominator in=20
the formula for R<FONT size=3D-2>L</FONT>-square (see below). When =
probability=20
(model chi-square) le .05, we reject the null hypothesis that =
knowing the=20
independents makes no difference in predicting the dependent in =
logistic=20
regression, where "le" means less than or equal to. Thus we want =
model=20
chi-square to be significant at the .05 level or better.=20
<P><I>Block chi-square</I> is a likelihood ratio test also printed =
by SPSS,=20
representing the change in model chi-square due to entering a block =
of=20
variables. <I>Step chi-square</I> is the change in model chi-square =
due in=20
<A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#stepwise">st=
epwise=20
logistic regression</A>. Earlier versions of SPSS referred to these =
as=20
"improvement chi-square." If variables are added one at a time, then =
block=20
and step chi-square will be equal, of course. <I>Note on categorical =
variables: </I>block chi-square is used to test the effect of =
entering a=20
categoical variable. In such a case, all dummy variables associated =
with the=20
categorical variable are entered as a block. The resulting block =
chi-square=20
value is considered more reliable than the Wald test, which can be=20
misleading for large effects in finite samples.=20
<P>These are alternatives to model chi-square for significance =
testing of=20
logistic regression:
<P>
<UL><A name=3Dgoodness></A>
<LI><B>Goodness of Fit</B>, also known as <I>Hosmer and Lemeshow's =
Goodness of Fit Index</I> or <I>C-hat,</I> is an alternative to =
model=20
chi-square for assessing the significance of a logistic regression =
model.=20
Menard (p. 21) notes it may be better when the number of =
combinations of=20
values of the independents is approximately equal to the number of =
cases=20
under analysis. This measure was included in SPSS output as =
"Goodness of=20
Fit" prior to Release 10. However, it was removed from the =
reformatted=20
output for SPSS Release 10 because, as noted by David Nichols, =
senior=20
statistician for SPSS, it "is done on individual cases and does =
not follow=20
a known distribution under the null hypothesis that the data were=20
generated by the fitted model, so it's not of any real use" =
(SPSSX-L=20
listserv message, 3 Dec. 1999).=20
<P><A name=3DHosmer></A></P>
<LI><B>Hosmer and Lemeshow's Goodness of Fit Test</B>, not to be =
confused=20
with ordinary Goodness of Fit above, tests the null hypothesis =
that the=20
data were generated by the model fitted by the researcher. The =
test=20
divides subjects into deciles based on predicted probabilities, =
then=20
computes a chi-square from observed and expected frequencies. Then =
a=20
probability (p) value is computed from the chi-square distribution =
with 8=20
degrees of freedom to test the fit of the logistic model. If the =
Hosmer=20
and Lemeshow Goodness-of-Fit test statistic is .05 or less, we =
reject the=20
null hypothesis that there is no difference between the observed =
and=20
model-predicted values of the dependent. (This means the model =
predicts=20
values significantly different from what they ought to be, which =
is the=20
observed values). If the H-L goodness-of-fit test statistic is =
greater=20
than .05, as we want, we fail to reject the null hypothesis that =
there is=20
no difference, implying that the model's estimates fit the data at =
an=20
acceptable level. This does not mean that the model necessarily =
explains=20
much of the variance in the dependent, only that however much or =
little it=20
does explain is significant. As with other tests, as the sample =
size gets=20
larger, the H-L test's power to detect differences from the null=20
hypothesis improves=20
<P><A name=3Dscore></A></P>
<LI><B>The Score statistic</B> is another alternative similar in =
function=20
to G<FONT size=3D-2>M</FONT> and is part of SAS's PROC LOGISTIC =
output.=20
<P><A name=3Daic></A></P>
<LI><B>The Akaike Information Criterion, AIC, </B>is another =
alternative=20
similar in function to G<FONT size=3D-2>M</FONT> and is part of =
SAS's PROC=20
LOGISTIC output.=20
<P><A name=3DSchwartz></A></P>
<LI><B>The Schwartz criterion</B> is a modified version of AIC and =
is part=20
of SAS's PROC LOGISTIC output.=20
<P></P></LI></UL></LI></OL>
<P><A name=3Drsquared></A></P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -