📄 pa 765 logistic regression.mht

📁 介绍各种经典算法的代码。说明详细
💻 MHT
📖 第 1 页 / 共 5 页
字号:
  <LI><B>R-squared</B>. There is no widely-accepted direct analog to OLS =

  regression's R<SUP>2</SUP>. This is because an R<SUP>2</SUP> measure =
seeks to=20
  make a statement about the "percent of variance explained," but the =
variance=20
  of a dichotomous or categorical dependent variable depends on the =
frequency=20
  distribution of that variable. For a dichotomous dependent variable, =
for=20
  instance, variance is at a maximum for a 50-50 split and the more =
lopsided the=20
  split, the lower the variance. This means that R-squared measures for =
logistic=20
  regressions with differing marginal distributions of their respective=20
  dependent variables cannot be compared directly, and comparison of =
logistic=20
  R-squared measures with R<SUP>2</SUP> from OLS regression is also =
problematic.=20
  Nonetheless, a number of logistic R-squared measures have been =
proposed.=20
  <P>Note that R<SUP>2</SUP>-like measures below are <U>not</U> =
goodness-of-fit=20
  tests but rather attempt to measure srength of association. For small =
samples,=20
  for instance, an R<SUP>2</SUP>-like measure might be high when =
goodness of fit=20
  was unacceptable by model chi-square or some other test.=20
  <P>
  <OL>
    <LI><B>R<FONT size=3D-2>L</FONT>-squared</B> is the proportionate =
reduction in=20
    chi-square and is also the proportionate reduction in the absolute =
value of=20
    the log-likelihood coefficient. R<FONT size=3D-2>L</FONT>-squared =
shows how=20
    much the inclusion of the independent variables in the logistic =
regression=20
    model reduces the badness-of-fit D<FONT size=3D-2>O</FONT> =
coefficient. R<FONT=20
    size=3D-2>L</FONT>-squared varies from 0 to 1, where 0 indicates the =

    independents have no usefulness in predicting the dependent. R<FONT=20
    size=3D-2>L</FONT>-squared =3D G<FONT size=3D-2>M</FONT>/D<FONT =
size=3D-2>O</FONT>.=20
    R<FONT size=3D-2>L</FONT>-squared often underestimates the =
proportion of=20
    variation explained in the underlying continuous (dependent) =
variable (see=20
    DeMaris, 1992: 54). As of version 7.5, R<FONT =
size=3D-2>L</FONT>-squared was=20
    not part of SPSS output but can be calculated by this formula.=20
    <P><A name=3DCox></A></P>
    <LI><B>Cox and Snell's R-Square</B> is an attempt to imitate the=20
    interpretation of multiple R-Square based on the likelihood, but its =
maximum=20
    can be (and usually is) less than 1.0, making it difficult to =
interpret. It=20
    is part of SPSS output.=20
    <P><A name=3DNagel></A></P>
    <LI><B>Nagelkerke's R-Square</B> is a further modification of the =
Cox and=20
    Snell coefficient to assure that it can vary from 0 to 1. That is,=20
    Nagelkerke's R<SUP>2</SUP> divides Cox and Snell's R<SUP>2</SUP> by =
its=20
    maximum in order to achieve a measure that ranges from 0 to 1. It is =
part of=20
    SPSS output. See Nagelkerke (1991).=20
    <P><A name=3Dpseudo></A></P>
    <LI><B>Pseudo-R-square</B> is a Aldrich and Nelson's coefficient =
which=20
    serves as an analog to the squared contingency coefficient, with an=20
    interpretation like R-square. Its maximum is less than 1. It may be =
used in=20
    either dichotomous or multinomial logistic regression.=20
    <P></P>
    <LI><B>R-square</B> is OLS R-square, which can be used in =
dichotomous=20
    logistic regression (see Menard, p. 23) but not in multinomial =
logistic=20
    regression. To obtain R-square, save the predicted values from =
logistic=20
    regression and run a bivariate regression on the observed dependent =
values.=20
    Note that logistic regression can yield deceptively high =
R<SUP>2</SUP>=20
    values when you have many variables relative to the number of cases, =
keeping=20
    in mind that the number of variables includes k-1 dummy variables =
for every=20
    categorical independent variable having k categories. </LI></OL>
  <P><A name=3Dmulti></A></P>
  <LI><B>Ordinal</B> and <B>Multinomial logistic regression</B> are =
extensions=20
  of logistic regression that allow the simultaneous comparison of more =
than one=20
  contrast. That is, the log odds of three or more contrasts are =
estimated=20
  simultaneously (ex., the probability of A vs. B, A vs.C, B vs.C., =
etc.).=20
  Multinomial logistic regression was supported by SPSS starting with =
Version 9=20
  and ordinal logistic regression starting with version 10. For earlier=20
  versions, note that the SPSS LOGISTIC REGRESSION procedure will not =
handle=20
  polytomous dependent variables. However, SPSS's LOGLINEAR procedure =
will=20
  handle multinomial logistic regression if all the independents are=20
  categorical. If there are any continuous variables, though, LOGLINEAR=20
  (available only in syntax) treats them as "cell covariates," assigning =
the=20
  cell mean to each case for each continuous independent. This is not =
the same=20
  as and will give different results from multinomial logistic =
regression.=20
  <P><B>SAS's PROC CATMOD </B>computes both simple and multinomial =
logistic=20
  regression, whereas PROC LOGIST is for simple (dichotomous) logistic=20
  regression. CATMOD uses a conventional model command: ex., model=20
  wsat*supsat*qman=3D_response_ /nogls ml ;. Note that in the model =
command, nogls=20
  suppresses generalized least squares estimation and ml specifies =
maximum=20
  likelihood estimation. </P></LI></UL>
<P>
<UL></UL>
<P><BR><A name=3Dassume></A>
<H2>Assumptions</H2>
<UL>
  <LI>Logistic regression is popular in part because it enables the =
researcher=20
  to overcome many of the restrictive assumptions of OLS regression:=20
  <P>
  <OL>
    <LI>Logistic regression does not assume a linear relationship =
between the=20
    dependents and the independents. It may handle nonlinear effects =
even when=20
    exponential and polynomial terms are not explicitly added as =
additional=20
    independents because the logit link function on the left-hand side =
of the=20
    logistic regression equation is non-linear. However, it is also =
possible and=20
    permitted to add explicit interaction and power terms as variables =
on the=20
    right-hand side of the logistic equation, as in OLS regression.=20
    <LI>The dependent variable need not be normally distributed.=20
    <LI>The dependent variable need not be homoscedastic for each level =
of the=20
    independent(s).=20
    <LI>Normally distributed error terms are not assumed.=20
    <LI>Logistic regression does not require that the independents be =
interval.=20
    <LI>Logistic regression does not require that the independents be =
unbounded.=20
    </LI></OL>
  <P><A name=3Dassume2></A></P>
  <LI>However, other assumptions of OLS regression still apply:=20
  <P>
  <OL>
    <LI><B>Inclusion of all relevant variables in the regression =
model</B>: If=20
    relevant variables are omitted, the common variance they share with =
included=20
    variables may be wrongly attributed to those variables, or the error =
term=20
    may be inflated.=20
    <P></P>
    <LI><B>Exclusion of all irrelevant variables</B>: If causally =
irrelevant=20
    variables are included in the model, the common variance they share =
with=20
    included variables may be wrongly attributed to the irrelevant =
variables.=20
    The more the correlation of the irrelevant variable(s) with other=20
    independents, the greater the standard errors of the regression =
coefficients=20
    for these independents.=20
    <P></P>
    <LI><B>Error terms are assumed to be independent</B>. Violations of =
this=20
    assumption can have serious effects. Violations are apt to occur, =
for=20
    instance, in correlated samples, such as before-after or =
matched-pairs=20
    studies. That is, subjects cannot provide multiple observations at =
different=20
    time points.=20
    <P></P>
    <LI><B>Linearity</B>. Logistic regression does not require linear=20
    relationships between the independents and the dependent, as does =
OLS=20
    regression, but it does assume a linear relationship between the =
logit of=20
    the independents and the dependent.=20
    <P></P>
    <LI><B>Additivity</B>. Like OLS regression, logistic regression does =
not=20
    account for interaction effects except when interaction terms =
(usually=20
    products of standardized independents) are created as additional =
variables=20
    in the analysis.=20
    <P></P>
    <LI><B>Independents are not linear functions of each other</B>: To =
the=20
    extent that one independent is a linear function of another =
independent, the=20
    problem of multicollinearity will occur in logistic regression, as =
it does=20
    in OLS regression. As the independents increase in correlation with =
each=20
    other, the standard errors of the logit (effect) coefficients will =
become=20
    inflated. Multicollinearity does not change the estimates of the=20
    coefficients, only their reliability. Multicollinearity and its =
handling is=20
    discussed more extensively in the StatNotes section on multiple =
regression.=20
    <P></P>
    <LI><B>Large samples</B>. Also, unlike OLS regression, logistic =
regression=20
    uses maximum likelihood estimation (MLE) rather than ordinary least =
squares=20
    (OLS) to derive parameters. MLE relies on large-sample asymptotic =
normality=20
    which means that reliability of estimates decline when there are few =
cases=20
    for each observed combination of X variables.=20
    <P></P>
    <LI><B>Expected dispersion</B>. In logistic regression the expected =
variance=20
    of the dependent can be compared to the observed variance, and =
discrepancies=20
    may be considered under- or overdispersion. If there is moderate=20
    discrepancy, standard errors will be over-optimistic and one should =
use=20
    adjusted standard error. Adjusted standard error will make the =
confidence=20
    intervals wider. However, if there are large discrepancies, this =
indicates a=20
    need to respecify the model, or that the sample was not random, or =
other=20
    serious design problems.The expected variance is ybar*(1 - ybar), =
where ybar=20
    is the mean of the fitted (estimated) y. This can be compared with =
the=20
    actual variance in observed y to assess under- or overdispersion. =
Adjusted=20
    SE equals SE * SQRT(D/df), where D is the scaled deviance, which for =

    logistic regression is -2LL, which is -2Log Likelihood in SPSS =
logistic=20
    regression output. </LI></OL>
  <P></P></LI></UL>
<P><BR><BR>
<H2>SPSS Output for Logistic Regression</H2>
<UL>
  <LI><A=20
  =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logispss.htm"><B>Commente=
d SPSS=20
  Output for Logistic Regression</B></A> </LI></UL>
<P><BR><BR>
<H2>Frequently Asked Questions</H2>
<UL>
  <LI><A=20
  =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#regress"><B>=
Why not=20
  just use regression with dichotomous dependents?</B></A>=20
  <LI><A=20
  =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#spss"><B>Wha=
t is=20
  the SPSS syntax for logistic regression?</B></A>=20
  <LI><A=20
  =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#catvars"><B>=
Will=20
  SPSS's logistic regression procedure handle my categorical variables=20
  automatically?</B></A>=20
  <LI><A=20
  =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#missing"><B>=
Can I=20
  handle missing cases the same in logistic regression as in OLS=20
  regression?</B></A>=20
  <LI><A =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#beta"><B>Is =

  it true for logistic regression, as it is for OLS regression, that the =
beta=20
  weight (standardized logit coefficient) for a given independent =
reflects its=20
  explanatory power controlling for other variables in the equation, and =
that=20
  the betas will change if variables are added or dropped from the=20
  equation?</B></A>=20
  <LI><A=20
  =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#rsquare"><B>=
What is=20
  the coefficient in logistic regression which corresponds to R-Square =
in=20
  multiple regression?</B></A>=20
  <LI><A=20
  =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#arsquare"><B=
>Is=20
  there a logistic regression analogy to adjusted R-square in OLS=20
  regression?</B></A>=20
  <LI><A=20
  =
href=3D"http://www2.chass.ncsu.edu/garson/pa765/logistic.htm#multicol"><B=
>Is=20
  multicollinearity a problem for logistic regression the way it is for =
💿 文件大小 3929 K
👤 上传用户 Erlin
📂 所属分类数据结构
🏷️ 相关标签

#算法 #代码
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -