📄 logistic regression 1st lecture.mht

📁 这是博弈论算法全集第六部分:局面描述,其它算法将陆续推出.以便与大家共享
💻 MHT
📖 第 1 页 / 共 5 页
字号:
paper=20
      solution is out of the question. We will use an iterative =
solution.=20
      (Explain)</P></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE>
<P><B>Graphing the relationships</B></P>
<BLOCKQUOTE>
  <P>I have talked about the shape of these distributions. I will jump =
ahead and=20
  calculate the predicted probability of success as a function of =
Survrate.</P>
  <BLOCKQUOTE>
    <P>To do this I just calculated the predicted log odds survrate, =
then took=20
    exp(log odds) to get odds, and then took prob =3D odds/(1+odds).</P>
    <P><IMG border=3D0 height=3D384=20
    =
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Logist4.gif"=20
    width=3D480></P>
    <P>Notice is sigmoid shape. (I could exaggerate it if I put in some =
cases=20
    with even lower SurvRate.)</P>
    <P>Now plot as odds against SurvRate</P>
    <P><IMG border=3D0 height=3D257=20
    =
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Logist3.gif"=20
    width=3D451></P>
    <P>That is very uninteresting, but I did it along the way. (Odds do =
extreme=20
    things at the extremes.)</P>
    <P>Now plot ln(odds) against SurvRate</P>
    <P><IMG border=3D0 height=3D257=20
    =
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Logist5.gif"=20
    width=3D451></P>
    <P><B>Notice that this is now linear.</B></P>
    <P>That was the point of all this. I wanted to show that a set of =
data that=20
    we would probably think of as curvilinear can be made linear, but we =
have to=20
    remember that in doing so the thing that we are trying to predict is =
no=20
    longer probability of success, but log odds of success. But there is =
nothing=20
    to prevent us from converting the result we obtain back into =
statistics that=20
    we are more used to dealing with.</P></BLOCKQUOTE></BLOCKQUOTE>
<P><B><FONT size=3D5>Running Logistic Regression with =
SPSS</FONT></B></P><FONT=20
face=3DHelvetica size=3D4><B>
<P>Step 1 with SPSS</B></FONT></P>
<BLOCKQUOTE>
  <BLOCKQUOTE><B>
    <P>Intercorrelation Matrix of Predictors</P></B>
    <BLOCKQUOTE>
      <P>Remember that these are linear relationships, but it gives us =
an idea=20
      of where we are starting.</P></BLOCKQUOTE><FONT face=3DSystem =
size=3D2><B>
    <P><IMG height=3D321=20
    =
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Image196.gif"=20
    width=3D594></B></FONT></P>
    <P><FONT size=3D3>I have simplified the output but the sample size =
was always=20
    66 and the significance is shown by asterisks</FONT></P>
    <P>Now<FONT face=3DMonaco size=3D1> </FONT>we need to run the =
Logistic=20
    Regression itself, with Outcome as the dv and Survrate as the =
predictor.</P>
    <P><FONT face=3DMonaco size=3D4><STRONG>SPSS Logistic=20
    Regression</STRONG></FONT></P><FONT color=3D#000000 face=3D"Courier =
New" size=3D2>
    <P>Dependent Variable.. OUTCOME Cancer Outcome</P>
    <P><SPAN style=3D"BACKGROUND-COLOR: #ffff00">Dependent Variable=20
    Encoding:<BR>Original Internal</SPAN></P>
    <P><SPAN style=3D"BACKGROUND-COLOR: #ffff00">Value Value<BR>1.00 =
0<BR>2.00=20
    1</SPAN></P>
    <P>Beginning Block Number 0. Initial Log Likelihood Function</P>
    <P>-2 Log Likelihood 77.345746</P>
    <P>* Constant is included in the model.</FONT></P>
    <P=20
    style=3D"BORDER-BOTTOM-STYLE: solid; BORDER-LEFT-STYLE: solid; =
BORDER-RIGHT-STYLE: solid; BORDER-TOP-STYLE: solid"><IMG=20
    alt=3D"logistic2.gif (5649 bytes)" height=3D468=20
    =
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/logistic2.gif"=20
    width=3D594></P></BLOCKQUOTE>
  <P>Discuss this printout in detail.</P>
  <BLOCKQUOTE>
    <P>1. Note: <SPAN style=3D"BACKGROUND-COLOR: #ffff00">They immediate =
start by=20
    converting the original values (which could have been 1 &amp; 2) to =
0 &amp;=20
    1. They are going to predict a 0, which in our case, <I>after than=20
    transformation,</I> is death (or getting worse).</SPAN></P>
    <P>2. The next thing that we see is " =A8C2 Log Likelihood"</P>
    <BLOCKQUOTE>
      <P>This is a model with just an intercept included. It is like =
testing a=20
      linear regression model with just <IMG height=3D18=20
      =
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Image94.gif"=20
      width=3D13> =3D <I>b</I><SUB>0</SUB> in it.</P>
      <P>That model is very uninteresting, but it gives us a base to =
start=20
      from.</P>
      <P>=A8C2 Log Likelihood =3D 77.346 is a chi-square statistic on 1 =
<I>df</I>,=20
      which is clearly significant. But we don=A1=AFt care about its =
significance=20
      here. A significant result means that the model does <B>not</B> =
fit the=20
      data adequately, just as a traditional chi-square test is =
significant when=20
      an independence model does not fit adequately.</P></BLOCKQUOTE>
    <P>3. Then SPSS enters SurvRate as an independent variable and =
reports=20
    another chi-square:</P>
    <BLOCKQUOTE>
      <P>=A8C2 Log Likelihood =3D 37.323</P>
      <P>This is a model with just an intercept included. It is like =
testing a=20
      linear regression model with just <IMG height=3D18=20
      =
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Image94.gif"=20
      width=3D13> =3D <I>b</I><SUB>0</SUB>=20
      +<I>b</I><SUB>1</SUB>*<I>X</I><SUB>1</SUB> in it, where=20
      <I>X</I><SUB>1</SUB> =3D SurvRate.</P>
      <P>This is a test on whether the new model, with SurvRate added, =
fits the=20
      data. A significant chi-square would say that it does not fit the =
data=20
      completely., though that certainly doesn't mean that it doesn't =
fit better=20
      than the previous model.</P>
      <P>This is a chi-square on <I>df</I> =3D number of predictors + 1 =
(the=20
      constant)&nbsp; =3D 2, and the test is significant.</P><B>
      <P>But</B> we aren=A1=AFt so much interested in whether it is a =
perfect fit as=20
      we are in whether the model with SurvRate in it fits <B>better =
than</B>=20
      the model without SurvRate. For that test we just find the amount =
of=20
      <I>improvement</I> in chi-square.</P>
      <BLOCKQUOTE>
        <P>Improvement =3D 77.346 =A8C 37.323 =3D =
40.023</P></BLOCKQUOTE>
      <P>This is itself a chi-square on 2 =A8C 1 =3D 1 <I>df</I>, and is =
certainly=20
      significant. </P>
      <BLOCKQUOTE>
        <P>In other words, SurvRate adds significantly to the prediction =
of=20
        Outcome.</P></BLOCKQUOTE></BLOCKQUOTE>
    <P>4. I deleted the classification table from the output. I think =
that they=20
    are generally quite misleading, because even dreadful data can =
sometimes=20
    have a high correct classification percentage.</P>
    <P>5<B>. The Regression Equation</P></B>
    <BLOCKQUOTE>
      <P>Log (odds Survival) =3D =A8C.0812*SurvRate + 2.6836</P>
      <P>This means that whenever two people differ by one point in =
SurvRate,=20
      the <EM>log odds</EM> of survival differ by =A8C.0812</P>
      <BLOCKQUOTE>
        <P>Notice that this interpretation is the same as for normal =
regression,=20
        except that we are predicting log odds.</P></BLOCKQUOTE>
      <P>Take someone with a SurvRate =3D 50. Then</P>
      <BLOCKQUOTE>
        <P>log odds =3D =A8C.0812(50) + 2.6836 =3D =A8C1.3764</P>
        <P>odds =3D <I>e</I><SUP>-1.3764</SUP> =3D .2525</P>
        <BLOCKQUOTE>
          <P>This means that they are .25 times more likely to die than =
survive.=20
          (It is important to keep in mind whether we are predicting =
death or=20
          survival.)</P>
          <P>If we take the inverse we have 1/.25 =3D 4.0, which means =
that with a=20
          50 you are 4 times more likely to live than=20
      die.</P></BLOCKQUOTE></BLOCKQUOTE>
      <P>Now someone with a 51 would have</P>
      <BLOCKQUOTE>
        <P>log odds =3D =A8C.0812(51) + 2.6836 =3D =A8C1.4576</P>
        <P>odds =3D <I>e</I><SUP>-1.4576</SUP>=3D .2378</P></BLOCKQUOTE>
      <P>The difference in log odds is =A8C.0812, which is the =
coefficient.</P><B>
      <P>But</B>, what does that mean?</P>
      <P>Notice that as Survrate increased the odds decreased.=20
      <STRONG>BUT</STRONG> these are the odds of <STRONG>NOT</STRONG> =
surviving.=20
      In other words, SPSS has chosen its own definition of "survival." =
You=20
      always have to watch out for this in logistic regression, =
regardless of=20
      the program you use.</P>
      <BLOCKQUOTE>
        <P>But if odds =3D <I>p</I>/(1-<I>p</I>), then <I>p</I> =3D =
odds/(1 + odds)=20
        </P>
        <P>For someone with a 50, <EM>p</EM> =3D .2524/1.2524 =3D =
.20</P>
        <P>for someone with a 51,<I> p</I> =3D ..2328/1.2328 =3D .19</P>
        <P>If you have a SurvRate =3D 50, you are not too likely to die. =
In fact,=20
        the probability of improving is .80. But if your survival rating =

        increases to 51, the probability of your dying decreases a tiny =
bit to=20
        .19. Thus higher survival ratings are associated with lower=20
        probabilities of dying.</P>
        <P>The only way I know of for being sure which direction things =
are=20
        going is to calculate a couple of probabilities and make sure =
you know=20
        what they mean.</P>
        <P>We could calculate the probability of surviving for every =
subject=20
        using the above equation. In fact, SPSS will do that for us and =
SAVE all=20
        of the predicted values. We can then make a scatterplot of =
predicted=20
        values against SurvRate.</P>
        <P>Notice that the probabilities (as calculated from log odds) =
do not=20
        exceed 0 and 1, and behave in just the ways I=A1=AFve been =
talking about.=20
        This again makes it obvious that we are plotting probability of =
getting=20
        worse, since it wouldn't make sense for the probabilities of =
survival to=20
        decrease as the rating of survival increases.</P><FONT =
face=3DSystem=20
        size=3D2><B>
        <P><IMG height=3D269=20
        =
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Image198.gif"=20
        width=3D336></B></FONT></P></BLOCKQUOTE>
      <P>Notice the sigmoidal curve we have been talking=20
  about.</P></BLOCKQUOTE></BLOCKQUOTE><B>
  <P>More about the coefficients:</P></B>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -