📄 logistic regression 1st lecture.mht
字号:
<BLOCKQUOTE>
<P>There is another way to look at this printout that is not about=20
probabilities. To the extreme right of the log odds ratio, we see =
0.922. We=20
can say that a one point increase in SurvRate reduces the =
<I><U>log</U>=20
<U>odds</U></I> of death by .0812. And <I>e</I><SUP>.-0812</SUP> =3D =
0.922.=20
Thus a one point increase in SurvRate <I>multiplies</I> the =
<I>odds</I> of=20
death by .922.</P>
<P>In other words, the entry to the right is exp(ln odds).</P>
<P>Notice also that we have a test on the significant of the =
coefficients.=20
This test is labeled "Wald", and it is a chi-square test (sort of). =
It isn=A1=AFt=20
exactly distributed chi-square, but nearly).</P>
<BLOCKQUOTE>
<P>Here Wald =3D 17.7558 on 1 df, which is =
significant.</P></BLOCKQUOTE>
<P>Notice that the Wald chi-square, which asks if SurvRate is =
significant,=20
doesn=A1=AFt agree with the change in chi-square, which asks if =
adding SurvRate=20
leads to a better fit. Blame this on Wald=A1=AAit is not a great =
test--it tends=20
to be conservative.</P></BLOCKQUOTE><B>
<P>Predicting Group Membership.</P></B>
<BLOCKQUOTE>
<P>We could make a prediction for every subject, and then put each =
subject=20
with <I>p</I> > .50 in the "non-survival" group, and everyone =
with=20
<I>p</I>< .50 in the "survival" group. This is shown=20
below.</P></BLOCKQUOTE><FONT face=3DMonaco size=3D1>
<P><IMG alt=3D"classif.gif (2550 bytes)" height=3D218=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/classif.gif"=20
width=3D643></P></FONT>
<BLOCKQUOTE><FONT face=3DMonaco size=3D1>
<P> </P></FONT>
<P>In this figure we have shown what actually happens, and you can =
see that=20
a few people with a predicted value less than 50 actually got worse, =
and a=20
few above 50 actually got better. But it isn=A1=AFt a huge =
difference between=20
predicted and actual.</P>
<P>SPSS actually gives us a table of outcomes in the printout, and =
this=20
shows that 86.36% of the predictions were correct.</P>
<P>Classification tables usually have an important "feel good" =
component,=20
but that can be very misleading. It is easy to come up with data =
where=20
almost everyone survives, and then all we have to do to get a great =
%correct=20
is to predict that <I>everyone</I> will survive. We will be pretty =
accurate,=20
but not particularly astute. </P>
<BLOCKQUOTE>
<P>People shouldn't be impressed with my amazing accuracy to say =
that my=20
Howell Test of galactic Threat (HTGT) is extremely accurate =
because I am=20
never wrong. (I simply make the same prediction for =
everyone--that=20
they will not die from being hit on the head by a meteor), and =
haven't=20
been wrong yet.)</P></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE><FONT =
face=3DHelvetica=20
size=3D4><B>
<P>Multiple Independent Variables</B></FONT></P>
<BLOCKQUOTE>
<P>Epping-Jordan, Compas, and Howell (1994) were not really interested =
in the=20
prediction of survival, although that=A1=AFs a good thing. They really =
wanted to=20
know what role Avoidance and Intrusions played in outcomes.</P>
<P>Here we have another <B>hierarchical regression</B> question.</P>
<BLOCKQUOTE>
<P>Get them to see why we want to look at SurvRate=20
<I>first</I>.</P></BLOCKQUOTE>
<P>The approach we will take is the hierarchical one of first entering =
SurvRate and then adding one or more other variables, such as Avoid or =
Intrus.=20
The first part we have already seen.</P>
<BLOCKQUOTE>
<P>I could just enter both Survrate and Intrus at the same time to =
see what=20
I get. But by adding them at separate stages (using "next" in the =
dialog=20
box) I can get more useful information.</P>
<P>First use Survrate, and then added Intrus. outcome =3D=20
dv.</P></BLOCKQUOTE><FONT face=3DMonaco size=3D1>
<P=20
style=3D"BORDER-BOTTOM-STYLE: solid; BORDER-LEFT-STYLE: solid; =
BORDER-RIGHT-STYLE: solid; BORDER-TOP-STYLE: solid"><IMG=20
alt=3D"logistic3.gif (5937 bytes)" height=3D494=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/logistic3.gif"=20
width=3D603></P></FONT>
<P><FONT size=3D3>In the earlier analysis we saw that without any =
predictors,=20
LR-chi-square =3D 77.3457. </FONT></P>
<P><FONT size=3D3>By Block 2 the result was 32.306. The difference =
between those=20
(45.140) is a chi-square test on the experimental hypothesis that the =
model=20
with 2 parameters (Survrate and Intrus) predicts better than chance. =
It has 2=20
<EM>df</EM> because there are 2 predictors. This is significant at =
<FONT=20
face=3DSymbol>a</FONT> =3D .0000.</FONT></P>
<P><FONT size=3D3>What I did not show, because we saw it before, is =
that at the=20
first step with just SurvRate, LR-chi-square =3D 37.323. With two =
predictors,=20
chi-square =3D 32.206. The difference between these is the test that =
Intrus adds=20
something to the prediction <EM>over and above</EM> SurvRate. This =
difference=20
is 5.118, which is shown above. It is on 1 <EM>df</EM>, and is =
significant at=20
a =3D .0237. Thus Intrusive thoughts add to (actually subtract from)=20
survivability after we control for medical variables that are included =
in=20
SurvRate.</FONT></P>
<P><FONT size=3D3><STRONG>Regression equation</STRONG></FONT></P>
<BLOCKQUOTE>
<P><FONT size=3D3>We can see that the optimal regression equation=20
is</FONT></P>
<P><FONT size=3D3>log odds(worse) =3D -0.0823*SurvRate + =
.1325*Intrus +=20
1.1961</FONT></P>
<P><FONT size=3D3>We can also see the Wald test on these =
coefficients. Note=20
that the test on Intrus gives a <EM>p</EM> =3D .0349, which is =
somewhat=20
different from the more accurate <EM>p</EM> =3D .0237 that we found=20
above.</FONT></P>
<P><FONT size=3D3>If we want to go from log odds to odds, we see the =
result on=20
the right.</FONT></P>
<BLOCKQUOTE>
<P><FONT size=3D3>e<SUP>-0.0823</SUP> =3D .9210</FONT></P>
<P><FONT size=3D3>e<SUP>1.1325</SUP> =3D=20
1.1417</FONT></P></BLOCKQUOTE></BLOCKQUOTE>
<BLOCKQUOTE>
<P><FONT size=3D3>Thus a one point difference in SurvRate multiplies =
the odds=20
of dying by .9210, when we control for Intrus. Likewise, a one point =
increase in Intrus multiplies the odds of dying by 1.1417 when we =
control=20
for SurvRate.</FONT></P></BLOCKQUOTE>
<P><STRONG>What if we add Avoid as well?</STRONG></P>
<BLOCKQUOTE>
<P>The following is a greatly abbreviated output, just focusing on =
our=20
problem.</P></BLOCKQUOTE>
<P=20
style=3D"BORDER-BOTTOM-STYLE: solid; BORDER-LEFT-STYLE: solid; =
BORDER-RIGHT-STYLE: solid; BORDER-TOP-STYLE: solid"><IMG=20
alt=3D"logistic4.gif (4798 bytes)" height=3D442=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/logistic4.gif"=20
width=3D601></P>
<P>Notice that the test of adding Intrus and Avoid after Survrate is =
not quite=20
significant (<EM>p</EM> =3D .0586). Why not?</P>
<BLOCKQUOTE>
<P>Although Intrus has much to offer, Avoid has almost nothing. We =
have=20
decreased LR-chi-square a bit more by adding Avoid at this stage =
(from 5.118=20
to 5.673), but we have spent a degree of freedom to do this. Whereas =
5.118=20
on 1 <EM>df</EM> was significant, 5.673 on 2 <EM>df</EM> is not. (It =
would=20
need to exceed 5.99).</P>
<P>Note that Wald still calls Intrus significant.</P>
<P>We would be better off going back to the one predictor case.</P>
<P> </P>
<P>The following is an e-mail message that I received last year. I =
think=20
that it brings up some interesting points. I don't expect you to =
remember it=20
all, but I would like you to remember that it is here, and refer to =
it if=20
you need something like r-squared.</P>
<HR>
<P=20
style=3D"BORDER-BOTTOM-STYLE: solid; BORDER-LEFT-STYLE: solid; =
BORDER-RIGHT-STYLE: solid; BORDER-TOP-STYLE: solid">>A=20
colleague using multiple logistic regression would like to=20
have:<BR><BR>>(1) an overall measure of the explanatory power of =
the=20
model, such as<BR>>proportion of variance explained in linear =
regression,=20
and ...<BR><BR>This issue has been considered extensively in the =
literature.=20
Apparently<BR>there is little consensus and no true R^2 measure in =
logistic=20
regression.<BR>Here are a couple of approaches you may =
consider.<BR><BR>(a)=20
Obtain predicted values--probabilities--of your outcome and =
calculate<BR>the=20
correlation between predicted probabilities and observed=20
(a<BR>point-biserial correlation). The correlation between predicted =
Y=20
and<BR>observed Y is exactly what R represents in linear=20
regression.<BR><BR>Agresti, A. (1996). An introduction to =
categorical data=20
analysis, Wiley.<BR>(p. 129) discusses this approach.<BR><BR>(b) Use =
the=20
model deviance (-2 log likelihood) to calculate a reduction =
in<BR>error=20
statistic. The deviance is analogous to sums of squares in=20
linear<BR>regression, so one measure of proportional reduction in=20
error--I<BR>think--that is similar to adjusted R^2 in linear =
regression=20
would be:<BR><BR>pseudo R^2 =3D =
(DEV(model)-DEV(null))/DEV(null)<BR><BR>where=20
DEV is the deviance, DEV(null) is the deviance for the null=20
model<BR>(intercept only), and DEV(model) is the deviance for the =
fitted=20
model. <BR><BR>There exists a number of methods for calculating =
pseudo R^2=20
values. A good<BR>discussion can be found in Maddala, G.S. (1983),=20
Limited-dependent and<BR>qualitative variables in economics,=20
Cambridge.<BR><BR>There are many published articles on this topic. =
Here are=20
just a few.<BR><BR>Nagelkerke, N.J.D. (1991). A note on a general =
definition=20
of the<BR>coefficient of determination. Biometrika, 78, 3,=20
691-692.<BR><BR>Agresti, A. (1986). Applying R^2 type measures to =
ordered=20
categorical data.<BR>Technometrics, 28, 2, 133-138.<BR><BR>Laitila, =
T.=20
(1993). A pseudo-R^2 measure for limited and =
qualitative<BR>dependent=20
variable models. Journal of Econometrics, 56, 341-356.<BR><BR>Cox, =
D.R.,=20
& Wermuth, N. (1992). A comment on the coefficient =
of<BR>determination=20
for binary responses. The American statistician, 46, 1,=20
1-4.<BR><BR><BR><BR>>(2) a way to compare the contributions of =
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -