📄 pa 765 logistic regression.mht
字号:
<LI><B>R-squared</B>. There is no widely-accepted direct analog to OLS =
regression's R<SUP>2</SUP>. This is because an R<SUP>2</SUP> measure =
seeks to=20
make a statement about the "percent of variance explained," but the =
variance=20
of a dichotomous or categorical dependent variable depends on the =
frequency=20
distribution of that variable. For a dichotomous dependent variable, =
for=20
instance, variance is at a maximum for a 50-50 split and the more =
lopsided the=20
split, the lower the variance. This means that R-squared measures for =
logistic=20
regressions with differing marginal distributions of their respective=20
dependent variables cannot be compared directly, and comparison of =
logistic=20
R-squared measures with R<SUP>2</SUP> from OLS regression is also =
problematic.=20
Nonetheless, a number of logistic R-squared measures have been =
proposed.=20
<P>Note that R<SUP>2</SUP>-like measures below are <U>not</U> =
goodness-of-fit=20
tests but rather attempt to measure srength of association. For small =
samples,=20
for instance, an R<SUP>2</SUP>-like measure might be high when =
goodness of fit=20
was unacceptable by model chi-square or some other test.=20
<P>
<OL>
<LI><B>R<FONT size=3D-2>L</FONT>-squared</B> is the proportionate =
reduction in=20
chi-square and is also the proportionate reduction in the absolute =
value of=20
the log-likelihood coefficient. R<FONT size=3D-2>L</FONT>-squared =
shows how=20
much the inclusion of the independent variables in the logistic =
regression=20
model reduces the badness-of-fit D<FONT size=3D-2>O</FONT> =
coefficient. R<FONT=20
size=3D-2>L</FONT>-squared varies from 0 to 1, where 0 indicates the =
independents have no usefulness in predicting the dependent. R<FONT=20
size=3D-2>L</FONT>-squared =3D G<FONT size=3D-2>M</FONT>/D<FONT =
size=3D-2>O</FONT>.=20
R<FONT size=3D-2>L</FONT>-squared often underestimates the =
proportion of=20
variation explained in the underlying continuous (dependent) =
variable (see=20
DeMaris, 1992: 54). As of version 7.5, R<FONT =
size=3D-2>L</FONT>-squared was=20
not part of SPSS output but can be calculated by this formula.=20
<P><A name=3DCox></A></P>
<LI><B>Cox and Snell's R-Square</B> is an attempt to imitate the=20
interpretation of multiple R-Square based on the likelihood, but its =
maximum=20
can be (and usually is) less than 1.0, making it difficult to =
interpret. It=20
is part of SPSS output.=20
<P><A name=3DNagel></A></P>
<LI><B>Nagelkerke's R-Square</B> is a further modification of the =
Cox and=20
Snell coefficient to assure that it can vary from 0 to 1. That is,=20
Nagelkerke's R<SUP>2</SUP> divides Cox and Snell's R<SUP>2</SUP> by =
its=20
maximum in order to achieve a measure that ranges from 0 to 1. It is =
part of=20
SPSS output. See Nagelkerke (1991).=20
<P><A name=3Dpseudo></A></P>
<LI><B>Pseudo-R-square</B> is a Aldrich and Nelson's coefficient =
which=20
serves as an analog to the squared contingency coefficient, with an=20
interpretation like R-square. Its maximum is less than 1. It may be =
used in=20
either dichotomous or multinomial logistic regression.=20
<P></P>
<LI><B>R-square</B> is OLS R-square, which can be used in =
dichotomous=20
logistic regression (see Menard, p. 23) but not in multinomial =
logistic=20
regression. To obtain R-square, save the predicted values from =
logistic=20
regression and run a bivariate regression on the observed dependent =
values.=20
Note that logistic regression can yield deceptively high =
R<SUP>2</SUP>=20
values when you have many variables relative to the number of cases, =
keeping=20
in mind that the number of variables includes k-1 dummy variables =
for every=20
categorical independent variable having k categories. </LI></OL>
<P><A name=3Dmulti></A></P>
<LI><B>Ordinal</B> and <B>Multinomial logistic regression</B> are =
extensions=20
of logistic regression that allow the simultaneous comparison of more =
than one=20
contrast. That is, the log odds of three or more contrasts are =
estimated=20
simultaneously (ex., the probability of A vs. B, A vs.C, B vs.C., =
etc.).=20
Multinomial logistic regression was supported by SPSS starting with =
Version 9=20
and ordinal logistic regression starting with version 10. For earlier=20
versions, note that the SPSS LOGISTIC REGRESSION procedure will not =
handle=20
polytomous dependent variables. However, SPSS's LOGLINEAR procedure =
will=20
handle multinomial logistic regression if all the independents are=20
categorical. If there are any continuous variables, though, LOGLINEAR=20
(available only in syntax) treats them as "cell covariates," assigning =
the=20
cell mean to each case for each continuous independent. This is not =
the same=20
as and will give different results from multinomial logistic =
regression.=20
<P><B>SAS's PROC CATMOD </B>computes both simple and multinomial =
logistic=20
regression, whereas PROC LOGIST is for simple (dichotomous) logistic=20
regression. CATMOD uses a conventional model command: ex., model=20
wsat*supsat*qman=3D_response_ /nogls ml ;. Note that in the model =
command, nogls=20
suppresses generalized least squares estimation and ml specifies =
maximum=20
likelihood estimation. </P></LI></UL>
<P>
<UL></UL>
<P><BR><A name=3Dassume></A>
<H2>Assumptions</H2>
<UL>
<LI>Logistic regression is popular in part because it enables the =
researcher=20
to overcome many of the restrictive assumptions of OLS regression:=20
<P>
<OL>
<LI>Logistic regression does not assume a linear relationship =
between the=20
dependents and the independents. It may handle nonlinear effects =
even when=20
exponential and polynomial terms are not explicitly added as =
additional=20
independents because the logit link function on the left-hand side =
of the=20
logistic regression equation is non-linear. However, it is also =
possible and=20
permitted to add explicit interaction and power terms as variables =
on the=20
right-hand side of the logistic equation, as in OLS regression.=20
<LI>The dependent variable need not be normally distributed.=20
<LI>The dependent variable need not be homoscedastic for each level =
of the=20
independent(s).=20
<LI>Normally distributed error terms are not assumed.=20
<LI>Logistic regression does not require that the independents be =
interval.=20
<LI>Logistic regression does not require that the independents be =
unbounded.=20
</LI></OL>
<P><A name=3Dassume2></A></P>
<LI>However, other assumptions of OLS regression still apply:=20
<P>
<OL>
<LI><B>Inclusion of all relevant variables in the regression =
model</B>: If=20
relevant variables are omitted, the common variance they share with =
included=20
variables may be wrongly attributed to those variables, or the error =
term=20
may be inflated.=20
<P></P>
<LI><B>Exclusion of all irrelevant variables</B>: If causally =
irrelevant=20
variables are included in the model, the common variance they share =
with=20
included variables may be wrongly attributed to the irrelevant =
variables.=20
The more the correlation of the irrelevant variable(s) with other=20
independents, the greater the standard errors of the regression =
coefficients=20
for these independents.=20
<P></P>
<LI><B>Error terms are assumed to be independent</B>. Violations of =
this=20
assumption can have serious effects. Violations are apt to occur, =
for=20
instance, in correlated samples, such as before-after or =
matched-pairs=20
studies. That is, subjects cannot provide multiple observations at =
different=20
time points.=20
<P></P>
<LI><B>Linearity</B>. Logistic regression does not require linear=20
relationships between the independents and the dependent, as does =
OLS=20
regression, but it does assume a linear relationship between the =
logit of=20
the independents and the dependent.=20
<P></P>
<LI><B>Additivity</B>. Like OLS regression, logistic regression does =
not=20
account for interaction effects except when interaction terms =
(usually=20
products of standardized independents) are created as additional =
variables=20
in the analysis.=20
<P></P>
<LI><B>Independents are not linear functions of each other</B>: To =
the=20
extent that one independent is a linear function of another =
independent, the=20
problem of multicollinearity will occur in logistic regression, as =
it does=20
in OLS regression. As the independents increase in correlation with =
each=20
other, the standard errors of the logit (effect) coefficients will =
become=20
inflated. Multicollinearity does not change the estimates of the=20
coefficients, only their reliability. Multicollinearity and its =
handling is=20
discussed more extensively in the StatNotes section on multiple =
regression.=20
<P></P>
<LI><B>Large samples</B>. Also, unlike OLS regression, logistic =
regression=20
uses maximum likelihood estimation (MLE) rather than ordinary least =
squares=20
(OLS) to derive parameters. MLE relies on large-sample asymptotic =
normality=20
which means that reliability of estimates decline when there are few =
cases=20
for each observed combination of X variables.=20
<P></P>
<LI><B>Expected dispersion</B>. In logistic regression the expected =
variance=20
of the dependent can be compared to the observed variance, and =
discrepancies=20
may be considered under- or overdispersion. If there is moderate=20
discrepancy, standard errors will be over-optimistic and one should =
use=20
adjusted standard error. Adjusted standard error will make the =
confidence=20
intervals wider. However, if there are large discrepancies, this =
indicates a=20
need to respecify the model, or that the sample was not random, or =
other=20
serious design problems.The expected variance is ybar*(1 - ybar), =
where ybar=20
is the mean of the fitted (estimated) y. This can be compared with =
the=20
actual variance in observed y to assess under- or overdispersion. =
Adjusted=20
SE equals SE * SQRT(D/df), where D is the scaled deviance, which for =
logistic regression is -2LL, which is -2Log Likelihood in SPSS =
logistic=20
regression output. </LI></OL>
<P></P></LI></UL>
<P><BR><BR>
<H2>SPSS Output for Logistic Regression</H2>
<UL>
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logispss.htm"><B>Commente=
d SPSS=20
Output for Logistic Regression</B></A> </LI></UL>
<P><BR><BR>
<H2>Frequently Asked Questions</H2>
<UL>
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#regress"><B>=
Why not=20
just use regression with dichotomous dependents?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#spss"><B>Wha=
t is=20
the SPSS syntax for logistic regression?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#catvars"><B>=
Will=20
SPSS's logistic regression procedure handle my categorical variables=20
automatically?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#missing"><B>=
Can I=20
handle missing cases the same in logistic regression as in OLS=20
regression?</B></A>=20
<LI><A =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#beta"><B>Is =
it true for logistic regression, as it is for OLS regression, that the =
beta=20
weight (standardized logit coefficient) for a given independent =
reflects its=20
explanatory power controlling for other variables in the equation, and =
that=20
the betas will change if variables are added or dropped from the=20
equation?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#rsquare"><B>=
What is=20
the coefficient in logistic regression which corresponds to R-Square =
in=20
multiple regression?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#arsquare"><B=
>Is=20
there a logistic regression analogy to adjusted R-square in OLS=20
regression?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#multicol"><B=
>Is=20
multicollinearity a problem for logistic regression the way it is for =
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -