📄 logistic regression 1st lecture.mht
字号:
From: <由 Microsoft Internet Explorer 5 保存>
Subject: Logistic Regression 1st lecture
Date: Sun, 20 Aug 2000 17:15:09 +0800
MIME-Version: 1.0
Content-Type: multipart/related;
boundary="----=_NextPart_000_001C_01C00ACA.318D76C0";
type="text/html"
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200
This is a multi-part message in MIME format.
------=_NextPart_000_001C_01C00ACA.318D76C0
Content-Type: text/html;
charset="gb2312"
Content-Transfer-Encoding: quoted-printable
Content-Location: http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regression/LogisticReg1.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Logistic Regression 1st lecture</TITLE>
<META content=3D"text/html; charset=3Dgb2312" http-equiv=3DContent-Type>
<META content=3D"none, default" name=3D"Microsoft Border">
<META content=3D"MSHTML 5.00.2614.3500" name=3DGENERATOR></HEAD>
<BODY>
<P><IMG alt=3D"header.gif (5403 bytes)" height=3D92=20
src=3D"http://hobbes.uvm.edu/gradstat/psych341/header.gif" =
width=3D372></P>
<HR>
<H1 align=3Dcenter><FONT color=3D#006300>Logistic Regression</FONT></H1>
<H3 align=3Dcenter><FONT color=3D#006300>4/11/00</FONT></H3>
<H2>Announcements</H2>
<UL>
<LI>Hand back assignments </LI></UL>
<H2>Logistic Regression</H2>
<BLOCKQUOTE>
<P>I am building on the foundation that I hope I laid on=20
Thursday. </P><B>
<P>Definition</B>: Logistic regression is a technique for making =
predictions=20
when the dependent variable is a dichotomy, and the independent =
variables are=20
continuous and/or discrete.</P>
<P>We are not really restricted to dichotomous dependent variables, =
because=20
the technique can be modified to handle <B>polytomous logistic =
regression</B>,=20
where the dependent variable can take on several levels. We have just=20
exhausted my knowledge of the subject, but students can look in Hosmer =
and=20
Lemeshow.</P>
<P>I am going to use the example from the text, because I want to have =
something they have seen before, and because I want to illustrate the=20
differences between SAS, which the text uses, and SPSS, which we will=20
use.</P><FONT size=3D4><B>
<P>Alternatives</B></FONT>:</P>
<BLOCKQUOTE><B>
<P>Discriminant analysis</P></B>
<BLOCKQUOTE>
<P>This is a more traditional approach, and the student=A1=AFs =
advisor may=20
first suggest that route. Here the idea is that we are using one =
or more=20
independent variables to predict "group membership," but there is =
no=20
difference between "group membership" and =
"survivor/non-survivor."</P>
<P>The problem with discriminant analysis is that is requires =
certain=20
normality assumptions that logistic regression does not require. =
In=20
addition, the emphasis there is really on putting people in =
groups,=20
whereas it is easier to look at the underlying structure of the =
prediction=20
("what are the important predictors?") when looking at logistic=20
regression.</P></BLOCKQUOTE></BLOCKQUOTE>
<BLOCKQUOTE><B>
<P>Linear Regression</P></B></BLOCKQUOTE>
<BLOCKQUOTE>
<BLOCKQUOTE>
<P>We <I>could</I> use plain old linear regression with a =
dichotomous=20
dependent variable.</P>
<P>We have already seen this back in Chapter 9 when we talked =
about=20
point-biserial correlation.</P>
<P>In fact, it works pretty well when the probability of survival =
varies=20
only between .20 and .80. It falls apart at the extremes, though =
probably=20
not all that badly.</P>
<P>It assumes that the relationship between the independent =
variable(s)=20
and the dependent variable is linear, whereas logistic regression =
assumes=20
that it is logarithmic. The reason it works in the non-extreme =
case is=20
that the logistic curve is quite linear in the center. (Illustrate =
on=20
board.)</P></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE><FONT =
face=3DHelvetica=20
size=3D4><B>
<P>Example</B></FONT></P>
<BLOCKQUOTE>
<P>Epping-Jordan, Compas, & Howell (1994)</P>
<BLOCKQUOTE>
<P>We were interested in looking at cancer outcomes as a function of =
psychological variables=A1=AAspecifically intrusions and avoidance =
behavior.</P>
<P>The emphasis here was on the <I>variables</I>, rather than on the =
<I>prediction</I>.</P></BLOCKQUOTE><B>
<P>Variables</B>
<UL>
<LI>ID=20
<LI>Outcome 1 =3D Improved 2 =3D Worse=20
<LI>SurvRate higher scores =3D better prognosis=20
<LI>Prognosis=20
<LI>AmtTreat=20
<LI>GSI=20
<LI>Intrus=20
<LI>Avoid </LI></UL>
<P>I have discussed some of these variables before in other contexts, =
so I=20
shouldn=A1=AFt need to go over them all.</P>
<P>What we are <I>really</I> interested in are Intrusions and =
Avoidance, but I=20
need to start with a simple example, so I will start with the Survival =
Rating=20
as the sole predictor. This also has the advantage of allowing me to =
ask if=20
those psychological variables have something to contribute =
<I>after</I> we=20
control for disease variables.</P>
<P>We can plot the relationship between Outcome and Survival Rating, =
but keep=20
in mind that there are overlapping points. To create this figure I =
altered the=20
data for the outcome variable to let 1 =3D success and 0 =3D failure =
(no=20
improvement or worse).</P>
<P>I have used a sunflower plot here. Every line in the "dot" =
represents a=20
case, so if we have a dot, we have one case; a vertical line =3D 2 =
cases; a=20
cross =3D 3 cases; etc. Notice that as we move from right to left we =
have most=20
of the cases as Outcome =3D 2, then cases equally spread between =
Outcome =3D 1 and=20
2, and then most of the cases at Outcome =3D 1.</P>
<BLOCKQUOTE>
<P> </P><FONT face=3DSystem size=3D2><B>
<P><IMG height=3D286=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/Image195.gif"=20
width=3D476></B></FONT></P>
<P> </P>
<P><FONT face=3DMonaco size=3D3>Draw logistic function on this =
figure. The=20
following was cut and pasted, after much much work with an image =
editor,=20
from the text. It plots</FONT></P><FONT face=3DMonaco size=3D3>
<P><IMG alt=3D"censor.gif (28120 bytes)" height=3D329=20
=
src=3D"http://hobbes.uvm.edu/gradstat/psych341/lectures/Logistic%20Regres=
sion/censor.gif"=20
width=3D588></P></FONT></BLOCKQUOTE>
<BLOCKQUOTE><FONT size=3D4><B>
<P>Censored Data</P></B></FONT></BLOCKQUOTE>
<BLOCKQUOTE>
<BLOCKQUOTE>
<P>Explain what censored data are, using Figure 15.8 from the =
book--see=20
above.</P>
<P>Explain how this leads to sigmoidal data.</P>
<P>Ask them when they would expect to see censored data in what =
they=20
do.</P>
<UL>
<LI>Plain old boring pass-fail measures=20
<LI>Extravert--introvert=20
<LI>Gender predicted from Bem's scale=20
<LI>Those admitted to graduate school versus those not admitted. =
</LI></UL></BLOCKQUOTE><FONT size=3D4><B>
<P>Odds Ratios</P></B></FONT>
<BLOCKQUOTE>
<P>The way most of us <EM>think</EM> about data like this is in =
terms of=20
<I>probabilities</I>. We talk about the probability of =
survival.</P>
<P>But it is equally possible to think in terms of the odds of =
survival,=20
and it works much better, statistically, if we talk about =
odds.</P>
<BLOCKQUOTE><B>
<P>Odds Survival</B> =3D Number Survived/Number Not-survived</P>
<P>or, equivalently,</P><B>
<P>Odds Survival</B> =3D=20
<I>p</I>(survival)/(1=A8C<I>p</I>(survival))</P></BLOCKQUOTE>
<P>If we had an unlimited number of subjects, and therefore lots =
of=20
subjects at each survival rating, we could calculate these odds. =
But we=20
don=A1=AFt have an unlimited number of points, and therefore we =
can=A1=AFt really=20
get them for every point. But that doesn=A1=AFt mean we can=A1=AFt =
operate <I>as=20
if</I> we could.</P>
<P><B>Draw such a figure on the board.</B></P>
<BLOCKQUOTE>
<P>At the very least, we will have such a problem at the high =
end where=20
once you get high enough you can't really get much higher. If a =
score of=20
70 gives you a probability of .96 of survival, a score of 80 =
can, <I>at=20
most</I>, move you up .04.</P></BLOCKQUOTE>
<P>Now we have to go one step further to get <B>log odds </B></P>
<P><B>log odds</B> will allow the relationship I discussed just =
above to=20
become linear.</P>
<BLOCKQUOTE><B>
<P>log odds survival </B>=3D ln(odds) =3D =
ln(<I>p</I>/1-<I>p</I>)</P>
<P>Notice that, by tradition, we use the natural logarithm =
rather than=20
log<SUB>10</SUB></P>
<P>This is often called the<B> logit</B> or the <B>logit=20
transform</P></B></BLOCKQUOTE>
<P>We will work with the logit, and will solve for the =
equation</P>
<BLOCKQUOTE>
<P><STRONG>log(<I>p</I>/(1-<I>p</I>)) =3D log(odds) =3D logit =
=3D=20
<I>b</I><SUB>0</SUB> + <I>b</I><SUB>1</SUB>SurvRate</STRONG></P>
<P>This is just a plain old<EM> linear equation</EM> because we =
are=20
using logs. That=A1=AFs why we switched to logs in the first =
place. The=20
equation would not be linear in terms of odds, as we saw in the =
graph=20
above.</P><I>
<P>B</I><SUB>0</SUB> is the intercept, and we usually don=A1=AFt =
care about=20
it.</P><I>
<P>B</I><SUB>1</SUB> is the slope, and is the change in the =
<I>log</I>=20
odds for a one unit change in SurvRate.</P></BLOCKQUOTE>
<P>We will solve for all of this by magic, since a pencil and =
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -