📄 logistic regression 1st lecture.mht
字号:
paper=20
solution is out of the question. We will use an iterative =
solution.=20
(Explain)</P></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE>
<P><B>Graphing the relationships</B></P>
<BLOCKQUOTE>
<P>I have talked about the shape of these distributions. I will jump =
ahead and=20
calculate the predicted probability of success as a function of =
Survrate.</P>
<BLOCKQUOTE>
<P>To do this I just calculated the predicted log odds survrate, =
then took=20
exp(log odds) to get odds, and then took prob =3D odds/(1+odds).</P>
<P><IMG border=3D0 height=3D384=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Logist4.gif"=20
width=3D480></P>
<P>Notice is sigmoid shape. (I could exaggerate it if I put in some =
cases=20
with even lower SurvRate.)</P>
<P>Now plot as odds against SurvRate</P>
<P><IMG border=3D0 height=3D257=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Logist3.gif"=20
width=3D451></P>
<P>That is very uninteresting, but I did it along the way. (Odds do =
extreme=20
things at the extremes.)</P>
<P>Now plot ln(odds) against SurvRate</P>
<P><IMG border=3D0 height=3D257=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Logist5.gif"=20
width=3D451></P>
<P><B>Notice that this is now linear.</B></P>
<P>That was the point of all this. I wanted to show that a set of =
data that=20
we would probably think of as curvilinear can be made linear, but we =
have to=20
remember that in doing so the thing that we are trying to predict is =
no=20
longer probability of success, but log odds of success. But there is =
nothing=20
to prevent us from converting the result we obtain back into =
statistics that=20
we are more used to dealing with.</P></BLOCKQUOTE></BLOCKQUOTE>
<P><B><FONT size=3D5>Running Logistic Regression with =
SPSS</FONT></B></P><FONT=20
face=3DHelvetica size=3D4><B>
<P>Step 1 with SPSS</B></FONT></P>
<BLOCKQUOTE>
<BLOCKQUOTE><B>
<P>Intercorrelation Matrix of Predictors</P></B>
<BLOCKQUOTE>
<P>Remember that these are linear relationships, but it gives us =
an idea=20
of where we are starting.</P></BLOCKQUOTE><FONT face=3DSystem =
size=3D2><B>
<P><IMG height=3D321=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Image196.gif"=20
width=3D594></B></FONT></P>
<P><FONT size=3D3>I have simplified the output but the sample size =
was always=20
66 and the significance is shown by asterisks</FONT></P>
<P>Now<FONT face=3DMonaco size=3D1> </FONT>we need to run the =
Logistic=20
Regression itself, with Outcome as the dv and Survrate as the =
predictor.</P>
<P><FONT face=3DMonaco size=3D4><STRONG>SPSS Logistic=20
Regression</STRONG></FONT></P><FONT color=3D#000000 face=3D"Courier =
New" size=3D2>
<P>Dependent Variable.. OUTCOME Cancer Outcome</P>
<P><SPAN style=3D"BACKGROUND-COLOR: #ffff00">Dependent Variable=20
Encoding:<BR>Original Internal</SPAN></P>
<P><SPAN style=3D"BACKGROUND-COLOR: #ffff00">Value Value<BR>1.00 =
0<BR>2.00=20
1</SPAN></P>
<P>Beginning Block Number 0. Initial Log Likelihood Function</P>
<P>-2 Log Likelihood 77.345746</P>
<P>* Constant is included in the model.</FONT></P>
<P=20
style=3D"BORDER-BOTTOM-STYLE: solid; BORDER-LEFT-STYLE: solid; =
BORDER-RIGHT-STYLE: solid; BORDER-TOP-STYLE: solid"><IMG=20
alt=3D"logistic2.gif (5649 bytes)" height=3D468=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/logistic2.gif"=20
width=3D594></P></BLOCKQUOTE>
<P>Discuss this printout in detail.</P>
<BLOCKQUOTE>
<P>1. Note: <SPAN style=3D"BACKGROUND-COLOR: #ffff00">They immediate =
start by=20
converting the original values (which could have been 1 & 2) to =
0 &=20
1. They are going to predict a 0, which in our case, <I>after than=20
transformation,</I> is death (or getting worse).</SPAN></P>
<P>2. The next thing that we see is " =A8C2 Log Likelihood"</P>
<BLOCKQUOTE>
<P>This is a model with just an intercept included. It is like =
testing a=20
linear regression model with just <IMG height=3D18=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Image94.gif"=20
width=3D13> =3D <I>b</I><SUB>0</SUB> in it.</P>
<P>That model is very uninteresting, but it gives us a base to =
start=20
from.</P>
<P>=A8C2 Log Likelihood =3D 77.346 is a chi-square statistic on 1 =
<I>df</I>,=20
which is clearly significant. But we don=A1=AFt care about its =
significance=20
here. A significant result means that the model does <B>not</B> =
fit the=20
data adequately, just as a traditional chi-square test is =
significant when=20
an independence model does not fit adequately.</P></BLOCKQUOTE>
<P>3. Then SPSS enters SurvRate as an independent variable and =
reports=20
another chi-square:</P>
<BLOCKQUOTE>
<P>=A8C2 Log Likelihood =3D 37.323</P>
<P>This is a model with just an intercept included. It is like =
testing a=20
linear regression model with just <IMG height=3D18=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Image94.gif"=20
width=3D13> =3D <I>b</I><SUB>0</SUB>=20
+<I>b</I><SUB>1</SUB>*<I>X</I><SUB>1</SUB> in it, where=20
<I>X</I><SUB>1</SUB> =3D SurvRate.</P>
<P>This is a test on whether the new model, with SurvRate added, =
fits the=20
data. A significant chi-square would say that it does not fit the =
data=20
completely., though that certainly doesn't mean that it doesn't =
fit better=20
than the previous model.</P>
<P>This is a chi-square on <I>df</I> =3D number of predictors + 1 =
(the=20
constant) =3D 2, and the test is significant.</P><B>
<P>But</B> we aren=A1=AFt so much interested in whether it is a =
perfect fit as=20
we are in whether the model with SurvRate in it fits <B>better =
than</B>=20
the model without SurvRate. For that test we just find the amount =
of=20
<I>improvement</I> in chi-square.</P>
<BLOCKQUOTE>
<P>Improvement =3D 77.346 =A8C 37.323 =3D =
40.023</P></BLOCKQUOTE>
<P>This is itself a chi-square on 2 =A8C 1 =3D 1 <I>df</I>, and is =
certainly=20
significant. </P>
<BLOCKQUOTE>
<P>In other words, SurvRate adds significantly to the prediction =
of=20
Outcome.</P></BLOCKQUOTE></BLOCKQUOTE>
<P>4. I deleted the classification table from the output. I think =
that they=20
are generally quite misleading, because even dreadful data can =
sometimes=20
have a high correct classification percentage.</P>
<P>5<B>. The Regression Equation</P></B>
<BLOCKQUOTE>
<P>Log (odds Survival) =3D =A8C.0812*SurvRate + 2.6836</P>
<P>This means that whenever two people differ by one point in =
SurvRate,=20
the <EM>log odds</EM> of survival differ by =A8C.0812</P>
<BLOCKQUOTE>
<P>Notice that this interpretation is the same as for normal =
regression,=20
except that we are predicting log odds.</P></BLOCKQUOTE>
<P>Take someone with a SurvRate =3D 50. Then</P>
<BLOCKQUOTE>
<P>log odds =3D =A8C.0812(50) + 2.6836 =3D =A8C1.3764</P>
<P>odds =3D <I>e</I><SUP>-1.3764</SUP> =3D .2525</P>
<BLOCKQUOTE>
<P>This means that they are .25 times more likely to die than =
survive.=20
(It is important to keep in mind whether we are predicting =
death or=20
survival.)</P>
<P>If we take the inverse we have 1/.25 =3D 4.0, which means =
that with a=20
50 you are 4 times more likely to live than=20
die.</P></BLOCKQUOTE></BLOCKQUOTE>
<P>Now someone with a 51 would have</P>
<BLOCKQUOTE>
<P>log odds =3D =A8C.0812(51) + 2.6836 =3D =A8C1.4576</P>
<P>odds =3D <I>e</I><SUP>-1.4576</SUP>=3D .2378</P></BLOCKQUOTE>
<P>The difference in log odds is =A8C.0812, which is the =
coefficient.</P><B>
<P>But</B>, what does that mean?</P>
<P>Notice that as Survrate increased the odds decreased.=20
<STRONG>BUT</STRONG> these are the odds of <STRONG>NOT</STRONG> =
surviving.=20
In other words, SPSS has chosen its own definition of "survival." =
You=20
always have to watch out for this in logistic regression, =
regardless of=20
the program you use.</P>
<BLOCKQUOTE>
<P>But if odds =3D <I>p</I>/(1-<I>p</I>), then <I>p</I> =3D =
odds/(1 + odds)=20
</P>
<P>For someone with a 50, <EM>p</EM> =3D .2524/1.2524 =3D =
.20</P>
<P>for someone with a 51,<I> p</I> =3D ..2328/1.2328 =3D .19</P>
<P>If you have a SurvRate =3D 50, you are not too likely to die. =
In fact,=20
the probability of improving is .80. But if your survival rating =
increases to 51, the probability of your dying decreases a tiny =
bit to=20
.19. Thus higher survival ratings are associated with lower=20
probabilities of dying.</P>
<P>The only way I know of for being sure which direction things =
are=20
going is to calculate a couple of probabilities and make sure =
you know=20
what they mean.</P>
<P>We could calculate the probability of surviving for every =
subject=20
using the above equation. In fact, SPSS will do that for us and =
SAVE all=20
of the predicted values. We can then make a scatterplot of =
predicted=20
values against SurvRate.</P>
<P>Notice that the probabilities (as calculated from log odds) =
do not=20
exceed 0 and 1, and behave in just the ways I=A1=AFve been =
talking about.=20
This again makes it obvious that we are plotting probability of =
getting=20
worse, since it wouldn't make sense for the probabilities of =
survival to=20
decrease as the rating of survival increases.</P><FONT =
face=3DSystem=20
size=3D2><B>
<P><IMG height=3D269=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Image198.gif"=20
width=3D336></B></FONT></P></BLOCKQUOTE>
<P>Notice the sigmoidal curve we have been talking=20
about.</P></BLOCKQUOTE></BLOCKQUOTE><B>
<P>More about the coefficients:</P></B>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -