📄 logistic regression 1st lecture.mht

📁 这是博弈论算法全集第六部分:局面描述,其它算法将陆续推出.以便与大家共享
💻 MHT
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
From: <由 Microsoft Internet Explorer 5 保存>
Subject: Logistic Regression 1st lecture
Date: Sun, 20 Aug 2000 17:15:09 +0800
MIME-Version: 1.0
Content-Type: multipart/related;
	boundary="----=_NextPart_000_001C_01C00ACA.318D76C0";
	type="text/html"
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200

This is a multi-part message in MIME format.

------=_NextPart_000_001C_01C00ACA.318D76C0
Content-Type: text/html;
	charset="gb2312"
Content-Transfer-Encoding: quoted-printable
Content-Location: http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regression/LogisticReg1.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Logistic Regression 1st lecture</TITLE>
<META content=3D"text/html; charset=3Dgb2312" http-equiv=3DContent-Type>
<META content=3D"none, default" name=3D"Microsoft Border">
<META content=3D"MSHTML 5.00.2614.3500" name=3DGENERATOR></HEAD>
<BODY>
<P><IMG alt=3D"header.gif (5403 bytes)" height=3D92=20
src=3D"http://hobbes.uvm.edu/gradstat/psych341/header.gif" =
width=3D372></P>
<HR>

<H1 align=3Dcenter><FONT color=3D#006300>Logistic Regression</FONT></H1>
<H3 align=3Dcenter><FONT color=3D#006300>4/11/00</FONT></H3>
<H2>Announcements</H2>
<UL>
  <LI>Hand back assignments </LI></UL>
<H2>Logistic Regression</H2>
<BLOCKQUOTE>
  <P>I am building on the foundation that I hope I laid on=20
Thursday.&nbsp;</P><B>
  <P>Definition</B>: Logistic regression is a technique for making =
predictions=20
  when the dependent variable is a dichotomy, and the independent =
variables are=20
  continuous and/or discrete.</P>
  <P>We are not really restricted to dichotomous dependent variables, =
because=20
  the technique can be modified to handle <B>polytomous logistic =
regression</B>,=20
  where the dependent variable can take on several levels. We have just=20
  exhausted my knowledge of the subject, but students can look in Hosmer =
and=20
  Lemeshow.</P>
  <P>I am going to use the example from the text, because I want to have =

  something they have seen before, and because I want to illustrate the=20
  differences between SAS, which the text uses, and SPSS, which we will=20
  use.</P><FONT size=3D4><B>
  <P>Alternatives</B></FONT>:</P>
  <BLOCKQUOTE><B>
    <P>Discriminant analysis</P></B>
    <BLOCKQUOTE>
      <P>This is a more traditional approach, and the student=A1=AFs =
advisor may=20
      first suggest that route. Here the idea is that we are using one =
or more=20
      independent variables to predict "group membership," but there is =
no=20
      difference between "group membership" and =
"survivor/non-survivor."</P>
      <P>The problem with discriminant analysis is that is requires =
certain=20
      normality assumptions that logistic regression does not require. =
In=20
      addition, the emphasis there is really on putting people in =
groups,=20
      whereas it is easier to look at the underlying structure of the =
prediction=20
      ("what are the important predictors?") when looking at logistic=20
      regression.</P></BLOCKQUOTE></BLOCKQUOTE>
  <BLOCKQUOTE><B>
    <P>Linear Regression</P></B></BLOCKQUOTE>
  <BLOCKQUOTE>
    <BLOCKQUOTE>
      <P>We <I>could</I> use plain old linear regression with a =
dichotomous=20
      dependent variable.</P>
      <P>We have already seen this back in Chapter 9 when we talked =
about=20
      point-biserial correlation.</P>
      <P>In fact, it works pretty well when the probability of survival =
varies=20
      only between .20 and .80. It falls apart at the extremes, though =
probably=20
      not all that badly.</P>
      <P>It assumes that the relationship between the independent =
variable(s)=20
      and the dependent variable is linear, whereas logistic regression =
assumes=20
      that it is logarithmic. The reason it works in the non-extreme =
case is=20
      that the logistic curve is quite linear in the center. (Illustrate =
on=20
      board.)</P></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE><FONT =
face=3DHelvetica=20
size=3D4><B>
<P>Example</B></FONT></P>
<BLOCKQUOTE>
  <P>Epping-Jordan, Compas, &amp; Howell (1994)</P>
  <BLOCKQUOTE>
    <P>We were interested in looking at cancer outcomes as a function of =

    psychological variables=A1=AAspecifically intrusions and avoidance =
behavior.</P>
    <P>The emphasis here was on the <I>variables</I>, rather than on the =

    <I>prediction</I>.</P></BLOCKQUOTE><B>
  <P>Variables</B>
  <UL>
    <LI>ID=20
    <LI>Outcome 1 =3D Improved 2 =3D Worse=20
    <LI>SurvRate higher scores =3D better prognosis=20
    <LI>Prognosis=20
    <LI>AmtTreat=20
    <LI>GSI=20
    <LI>Intrus=20
    <LI>Avoid </LI></UL>
  <P>I have discussed some of these variables before in other contexts, =
so I=20
  shouldn=A1=AFt need to go over them all.</P>
  <P>What we are <I>really</I> interested in are Intrusions and =
Avoidance, but I=20
  need to start with a simple example, so I will start with the Survival =
Rating=20
  as the sole predictor. This also has the advantage of allowing me to =
ask if=20
  those psychological variables have something to contribute =
<I>after</I> we=20
  control for disease variables.</P>
  <P>We can plot the relationship between Outcome and Survival Rating, =
but keep=20
  in mind that there are overlapping points. To create this figure I =
altered the=20
  data for the outcome variable to let 1 =3D success and 0 =3D failure =
(no=20
  improvement or worse).</P>
  <P>I have used a sunflower plot here. Every line in the "dot" =
represents a=20
  case, so if we have a dot, we have one case; a vertical line =3D 2 =
cases; a=20
  cross =3D 3 cases; etc. Notice that as we move from right to left we =
have most=20
  of the cases as Outcome =3D 2, then cases equally spread between =
Outcome =3D 1 and=20
  2, and then most of the cases at Outcome =3D 1.</P>
  <BLOCKQUOTE>
    <P>&nbsp;</P><FONT face=3DSystem size=3D2><B>
    <P><IMG height=3D286=20
    =
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Image195.gif"=20
    width=3D476></B></FONT></P>
    <P>&nbsp;</P>
    <P><FONT face=3DMonaco size=3D3>Draw logistic function on this =
figure. The=20
    following was cut and pasted, after much much work with an image =
editor,=20
    from the text. It plots</FONT></P><FONT face=3DMonaco size=3D3>
    <P><IMG alt=3D"censor.gif (28120 bytes)" height=3D329=20
    =
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/censor.gif"=20
    width=3D588></P></FONT></BLOCKQUOTE>
  <BLOCKQUOTE><FONT size=3D4><B>
    <P>Censored Data</P></B></FONT></BLOCKQUOTE>
  <BLOCKQUOTE>
    <BLOCKQUOTE>
      <P>Explain what censored data are, using Figure 15.8 from the =
book--see=20
      above.</P>
      <P>Explain how this leads to sigmoidal data.</P>
      <P>Ask them when they would expect to see censored data in what =
they=20
      do.</P>
      <UL>
        <LI>Plain old boring pass-fail measures=20
        <LI>Extravert--introvert=20
        <LI>Gender predicted from Bem's scale=20
        <LI>Those admitted to graduate school versus those not admitted. =

    </LI></UL></BLOCKQUOTE><FONT size=3D4><B>
    <P>Odds Ratios</P></B></FONT>
    <BLOCKQUOTE>
      <P>The way most of us <EM>think</EM> about data like this is in =
terms of=20
      <I>probabilities</I>. We talk about the probability of =
survival.</P>
      <P>But it is equally possible to think in terms of the odds of =
survival,=20
      and it works much better, statistically, if we talk about =
odds.</P>
      <BLOCKQUOTE><B>
        <P>Odds Survival</B> =3D Number Survived/Number Not-survived</P>
        <P>or, equivalently,</P><B>
        <P>Odds Survival</B> =3D=20
      <I>p</I>(survival)/(1=A8C<I>p</I>(survival))</P></BLOCKQUOTE>
      <P>If we had an unlimited number of subjects, and therefore lots =
of=20
      subjects at each survival rating, we could calculate these odds. =
But we=20
      don=A1=AFt have an unlimited number of points, and therefore we =
can=A1=AFt really=20
      get them for every point. But that doesn=A1=AFt mean we can=A1=AFt =
operate <I>as=20
      if</I> we could.</P>
      <P><B>Draw such a figure on the board.</B></P>
      <BLOCKQUOTE>
        <P>At the very least, we will have such a problem at the high =
end where=20
        once you get high enough you can't really get much higher. If a =
score of=20
        70 gives you a probability of .96 of survival, a score of 80 =
can, <I>at=20
        most</I>, move you up .04.</P></BLOCKQUOTE>
      <P>Now we have to go one step further to get <B>log odds </B></P>
      <P><B>log odds</B> will allow the relationship I discussed just =
above to=20
      become linear.</P>
      <BLOCKQUOTE><B>
        <P>log odds survival </B>=3D ln(odds) =3D =
ln(<I>p</I>/1-<I>p</I>)</P>
        <P>Notice that, by tradition, we use the natural logarithm =
rather than=20
        log<SUB>10</SUB></P>
        <P>This is often called the<B> logit</B> or the <B>logit=20
        transform</P></B></BLOCKQUOTE>
      <P>We will work with the logit, and will solve for the =
equation</P>
      <BLOCKQUOTE>
        <P><STRONG>log(<I>p</I>/(1-<I>p</I>)) =3D log(odds) =3D logit =
=3D=20
        <I>b</I><SUB>0</SUB> + <I>b</I><SUB>1</SUB>SurvRate</STRONG></P>
        <P>This is just a plain old<EM> linear equation</EM> because we =
are=20
        using logs. That=A1=AFs why we switched to logs in the first =
place. The=20
        equation would not be linear in terms of odds, as we saw in the =
graph=20
        above.</P><I>
        <P>B</I><SUB>0</SUB> is the intercept, and we usually don=A1=AFt =
care about=20
        it.</P><I>
        <P>B</I><SUB>1</SUB> is the slope, and is the change in the =
<I>log</I>=20
        odds for a one unit change in SurvRate.</P></BLOCKQUOTE>
      <P>We will solve for all of this by magic, since a pencil and =
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -