📄 pa 765 logistic regression.mht
字号:
multiple=20
linear regression?</B></A>=20
<LI><A =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#vif"><B>What=
=20
is the logistic equivalent to the VIF test for multicollinearity in =
OLS=20
regression? Can odds ratios be used?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#assumpt"><B>=
How=20
does one test to see if the assumption of linearity in the logit is =
met for=20
each of the independents?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#specificatio=
n"><B>How=20
can one use estimated variance of residuals to test for model=20
misspecification?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#interact"><B=
>How=20
are interaction effects handled in logistic regression?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#stepwise"><B=
>Does=20
stepwise logistic regression exist, as it does for OLS =
regression?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#residual"><B=
>Does=20
analysis of residuals work in logistic regression as it does in =
OLS?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#indeps"><B>H=
ow many=20
independents can I have?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#paramet"><B>=
How do=20
I express the logistic regression equation if one or more of my =
independents=20
is categorical?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#multgroups">=
<B>How=20
do I compare logit coefficients across groups formed by a categorical=20
independent variable?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#confiden"><B=
>How do=20
I compute the confidence interval for the unstandardized logit =
(effect)=20
coefficients?</B></A>=20
<LI><A=20
=
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#sas"><B>SAS'=
s PROC=20
CATMOD for multinomial logistic regression is not user friendly. Where =
can I=20
get some help?</B></A>=20
<P><BR><BR><A name=3Dregress></A></P>
<LI><B>Why not just use regression with dichotomous dependents?</B>=20
<UL>When the dependent is binary, the distribution of residual error =
is=20
heteroscedastic, which violates one of the assumptions of regression =
analysis. Likewise, when the dependent variable is binary it is not =
normally=20
distributed, so OLS estimates of the sum of squares are mis-leading =
and=20
therefore significance tests and the standard error of regression =
are wrong.=20
Also, for a dependent which assumes values of 0 and 1, the =
regression model=20
will allow estimates below 0 and above 1. Also, multiple linear =
regression=20
does not handle non-linear relationships, whereas log-linear methods =
do.=20
These objections to the use of regression with dichotomous =
dependents apply=20
to polytomous dependents also. </UL>
<P><A name=3Dspss></A></P>
<LI><B>What is the SPSS syntax for logistic regression?</B>=20
<UL>With SPSS 10, logistic regression is found under Analyze - =
Regression -=20
Binary Logistic or Multinomial Logistic.=20
<P><PRE>LOGISTIC REGRESSION /VARIABLES income WITH age SES gender =
opinion1
opinion2 region
/CATEGORICAL=3Dgender, opinion1, opinion2, region
/CONTRAST(region)=3DINDICATOR(4)
/METHOD FSTEP(LR)
/CLASSPLOT
</PRE>Above is the SPSS syntax in simplified form. The dependent =
variable is=20
the variable immediately after the VARIABLES term. The independent =
variables=20
are those immediately after the WITH term. The CATEGORICAL command =
specifies=20
any categorical variables; note these must also be listed in the =
VARIABLES=20
statement. The CONTRAST command tells SPSS which category of a =
categorical=20
variable is to be dropped when it automatically constructs dummy =
variables=20
(here it is the 4th value of "region"; this value is the fourth one =
and is=20
<U>not</U> necessarily coded "4"). The METHOD subcommand sets the =
method of=20
computation, here specified as FSTEP to indicate forward stepwise =
logistic=20
regression. Alternatives are BSTEP (backward stepwise logistic =
regression)=20
and ENTER (enter terms as listed, usually because their order is set =
by=20
theories which the researcher is testing). ENTER is the default =
method. The=20
(LR) term following FSTEP specifies that likelihood ratio criteria =
are to be=20
used in the stepwise addition of variables to the model. The =
/CLASSPLOT=20
option specifies a histogram of predicted probabilities is to output =
(see=20
above). </UL>
<P><A name=3Dcatvars></A></P>
<LI><B>Will SPSS's logistic regression procedure handly my categorical =
variables automatically?</B>=20
<UL>No, at least through Version 8. You must declare your categorical=20
variables categorical if they have more than two values. </UL>
<P><A name=3Dmissing></A></P>
<LI><B>Can I handle missing cases the same in logistic regression as =
in OLS=20
regression?</B>=20
<UL>No. In the linear model assumed by OLS regression, one may choose =
to=20
estimate missing values based on OLS regression of the variable with =
missing=20
cases, based on non-missing data. However, the nonlinear model =
assumed by=20
logistic regression requires a full set of data. Therefore SPSS =
provides=20
only for LISTWISE deletion of cases with missing data, using the =
remaining=20
full dataset to calculate logistic parameters. </UL>
<P><A name=3Dbeta></A></P>
<LI><B>Is it true for logistic regression, as it is for OLS =
regression, that=20
the beta weight (standardized logit coefficient) for a given =
independent=20
reflects its explanatory power controlling for other variables in the=20
equation, and that the betas will change if variables are added or =
dropped=20
from the equation?</B>=20
<UL>Yes, the same basic logic applies. This is why it is best in =
either form=20
of regression to compare two or more models for their relative fit =
to the=20
data rather than simply to show the data are not inconsistent with a =
single=20
model. The model, of course, dictates which variables are entered =
and one=20
uses the ENTER method in SPSS, which is the default method. </UL>
<P><A name=3Drsquare></A></P>
<LI><B>What is the coefficient in logistic regression which =
corresponds to=20
R-Square in multiple regression?</B>=20
<UL>There is no exactly analogous coefficient. See the discussion of =
R<FONT=20
size=3D-2>L</FONT>-squared, above. <I>Cox and Snell's R-Square</I> =
is an=20
attempt to imitate the interpretation of multiple R-Square, and=20
<I>Nagelkerke's R-Square</I> is a further modification of the Cox =
and Snell=20
coefficient to assure that it can vary from 0 to 1. </UL>
<P><A name=3Darsquare></A></P>
<LI><B>Is there a logistic regression analogy to adjusted R-square in =
OLS=20
regression?</B>=20
<UL>Yes. <B>R<FONT size=3D-2>LA</FONT>-squared</B> is adjusted R<FONT=20
size=3D-2>L</FONT>-squared, and is similar to adjusted R-square in =
OLS=20
regression. R<FONT size=3D-2>LA</FONT>-squared penalizes R<FONT=20
size=3D-2>L</FONT>-squared for the number of independents on the =
assumption=20
that R-square will become artificially high simply because some=20
independents' chance variations "explain" small parts of the =
variance of the=20
dependent. R<FONT size=3D-2>LA</FONT>-squared =3D (G<FONT =
size=3D-2>M</FONT> -=20
2k)/D<FONT size=3D-2>O</FONT>, where k =3D the number of =
independents. </UL>
<P>
<P><A name=3Dmulticol></A></P>
<LI><B>Is multicollinearity a problem for logistic regression the way =
it is=20
for multiple linear regression?</B>=20
<UL>Absolutely. The discussion in "Statnotes" under the "Regression" =
topic=20
is relevant to logistic regression. </UL>
<P>
<P><A name=3Dvif></A></P>
<LI><B>What is the logistic equivalent to the VIF test for =
multicollinearity=20
in OLS regression? Can odds ratios be used?</B>=20
<UL>The variance inflation factor (VIF) is indeed a problem when high =
in OLS=20
regression. VIF is the reciprocal of tolerance, which is 1 - =
R-squared. When=20
there is high multicollinearity, R-squared will be high also, so =
tolerance=20
will be low, and thus VIF will be high. When VIF is high, the b and =
beta=20
weights are unreliable and subject to misinterpretation. For typical =
social=20
science research, where R-squared is often not higher than .75, =
inflation of=20
the standard error of b (or beta) will be no higher than about 50%.=20
<P>As there is no direct counterpart to R-squared in logistic =
regression,=20
VIF is not computed (that I have seen, though obviously one could =
apply the=20
same logic to various psuedo-R-squared measures).=20
<P>The odds ratio is a measure of association, consisting of one =
odds=20
divided by another odds. Odds ratios below 1.0 are associated with =
decreases=20
in the dependent variable, while odds ratios above 1.0 are =
associated with=20
increases. Note the asymmetry: 0 - 1 for decreases, 1 - inifinity =
for=20
increases. To eliminate this asymmetry, we compute the logit of the=20
dependent variable, which is the natural logarithm of the odds =
ratio.=20
Logit(Y) becomes negative and increasingly large in magnitude as the =
odds=20
ratio decreases from 1 to 0, and becomes increasingly large and =
positive as=20
the odds ratio increases from 1 to infinity.=20
<P>When the logit is very high in one direction or the other, then =
the=20
higher is the association of the independent variable with the =
dependent. To=20
compare across the effects of multiple independents, one would use =
the=20
standardized logit, much like betas in OLS regression. =
Interpretation of=20
this could be unreliable if multicollinearity is high.=20
<P>To the extent that one independent is linearly relation to =
another=20
independent, multicollinearity could be a problem in logistic =
regression.=20
However, unlike OLS regression, logistic regression does not assume=20
linearity of relationship among independents. The Box-Tidwell =
transformation=20
and orthogonal polynomial contrasts are ways of testing linearity =
among the=20
independents.=20
<P>A high odds ratio would not be evidence of multicollinearity in =
itself.=20
Unfortunately, I am not aware of a VIF-type test for logistic =
regression,=20
and I would think that the same obstacles would exist as for =
creating a true=20
equivalent to OLS R-squared. </P></UL>
<P><A name=3Dassumpt></A></P>
<LI><B>How does one test to see if the assumption of linearity in the =
logit is=20
met for each of the independents?</B>=20
<UL>
<LI><B>Box-Tidwell Transformation</B>: Add to the logistic model =
interaction=20
terms which are the crossproduct of each independent times its =
natural=20
logarithm [(X)ln(X)]. If these transformations are significant, then =
there=20
is nonlinearity in the logit. This method is not sensitive to small=20
nonlinearities.=20
<P></P>
<LI><B>Orthogonal polynomial contrasts</B>, an option in SPSS, may =
be used.=20
This option treats each independent as a categorical variable and =
computes=20
logit (effect) coefficients for each category, testing for linear,=20
quadratic, cubic, or higher-order effects. The logit should not =
change over=20
the contrasts. This method is not appropriate when the independent =
has a=20
large number of values, inflating the standard errors of the =
contrasts.=20
<P></P></LI></UL><A name=3Dspecification></A>
<LI><B>How can one use estimated variance of residuals to test for =
model=20
misspecification?</B>=20
<UL>
<LI>The misspecification problem may be assessed by comparing =
expected=20
variance of residuals with observed variance. Since logistic =
regression=20
assumes binomial errors, the estimated variance (y) =3D m(1 - m), =
where m =3D=20
estimated mean residual. "Overdispersion" is when the observed =
variance of=20
the residu
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -