📄 node11.htm
字号:
</PRE>This is what the histogram looks like:<BR><IMG SRC="mouse1.gif", width=400></TD></TR></TABLE><TABLE WIDTH="300"><TR><TD><B>Control Group</B><PRE>control=[52 104 146 10 51 30 40 27 46]';>> median(control)ans = 46>> mean(control)ans = 56.2222>> var(control)ans = 1.8042e+03>> var(control)/length(control)ans = 200.4660>> sqrt(200.4660)ans = 14.1586thetab=zeros(1,1000);for (b =(1:1000)) thetab(b)=median(bsample(control));endhist(thetab)>> sqrt(var(thetab))ans = 11.9218>> mean(thetab)ans = 45.4370</PRE>This is what the histogram looks like:<BR><IMG SRC="mouse2.gif", width=400></TD></TR></TABLE><P>Comparing the two medians, we could use the estimates of the standarderrorsto find out if the difference between the two medians issignificant?<P><H2><A NAME="SECTION00252000000000000000">The combinatorics of the bootstrap distribution</A></H2>As we noted in class, and looking at the histograms, the mainaspect of the bootstrap distribution of the median isthat it can take on very few values, in the caseof the treatment group for instance,<IMG WIDTH="14" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img48.png" ALT="$7$">.The simple bootstrap will always present this discrete characteristiceven if we know the underlying distribution is continuous, there areways to fix this and in many cases it won't matter but it is animportant feature.<H3><A NAME="SECTION00252100000000000000">How many different bootstrap samples are there?</A></H3>By different samples, the samples must differ as sets, iethere is no difference between the sample <!-- MATH $\{x_1,x_2,\ldots,x_n\}$ --><IMG WIDTH="130" HEIGHT="37" ALIGN="MIDDLE" BORDER="0" SRC="img49.png" ALT="$\{x_1,x_2,\ldots,x_n\}$"><!-- MATH $\{x_2,x_1,\ldots , x_n \}$ --><IMG WIDTH="130" HEIGHT="37" ALIGN="MIDDLE" BORDER="0" SRC="img50.png" ALT="$\{x_2,x_1,\ldots , x_n \}$">, ie the observations are exchangeable or the statistic of interestis a symmetrical function <IMG WIDTH="13" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img51.png" ALT="$s$"> of the sample:<!-- MATH $\hat{\theta}=s(\mbox{${\cal X}$})$ --><IMG WIDTH="79" HEIGHT="45" ALIGN="MIDDLE" BORDER="0" SRC="img52.png" ALT="$\hat{\theta}=s(\mbox{${\cal X}$})$">.<BR>Definition:<BR><A NAME="def:exchangeable"></A><A NAME="522"></A>The sequence <!-- MATH $(X_1,X_2,\ldots,X_n)$ --><IMG WIDTH="140" HEIGHT="37" ALIGN="MIDDLE" BORDER="0" SRC="img53.png" ALT="$(X_1,X_2,\ldots,X_n)$"> of random variablesis said to be <FONT COLOR="#ff0000">exchangeable</FONT> if the distribution of the <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> vector<!-- MATH $(X_1,X_2,\ldots,X_n)$ --><IMG WIDTH="140" HEIGHT="37" ALIGN="MIDDLE" BORDER="0" SRC="img53.png" ALT="$(X_1,X_2,\ldots,X_n)$"> is the same as that of <!-- MATH $(X_{\pi(1)},X_{\pi(2)},\ldots,X_{\pi(n)})$ --><IMG WIDTH="196" HEIGHT="37" ALIGN="MIDDLE" BORDER="0" SRC="img54.png" ALT="$(X_{\pi(1)},X_{\pi(2)},\ldots,X_{\pi(n)})$">, for <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img55.png" ALT="$\pi$"> any permutation of <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> elements.<P>Suppose we condition on the sample of <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> distinct observations <!-- MATH $\mbox{${\cal X}$}$ --><IMG WIDTH="21" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img56.png" ALT="$\mbox{${\cal X}$}$">,there are as many different samples as there are ways of choosing <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> objects outof a set of <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> possible contenders, repetitions being allowed.<P>At this point it is interesting to introduce a newnotation for a bootstrap resample,up to now we have noted a possiblereasample, say <!-- MATH $\mbox{${\cal X}$}^{*b}=\{x_1,x_1,x_3,x_4,x_4\}$ --><IMG WIDTH="203" HEIGHT="43" ALIGN="MIDDLE" BORDER="0" SRC="img57.png" ALT="$\mbox{${\cal X}$}^{*b}=\{x_1,x_1,x_3,x_4,x_4\}$">,because of the exchangeability/symmetry propertywe can recode this as the <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> vector countingthe number of occurrences of each of the observations.in this recoding we have <!-- MATH $\mbox{${\cal X}$}^{*b}=(2,0,1,2,0)$ --><IMG WIDTH="154" HEIGHT="43" ALIGN="MIDDLE" BORDER="0" SRC="img58.png" ALT="$\mbox{${\cal X}$}^{*b}=(2,0,1,2,0)$">and the set of all bootstrap resamplesis the <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> dimensional <FONT COLOR="#ff0000">simplex</FONT> <A NAME="def:simplex"></A><A NAME="533"></A><BR><P></P><DIV ALIGN="CENTER"><!-- MATH \begin{displaymath}C_n=\{(k_1,k_2,\ldots,k_n), k_i \in \N, \sum k_i=n \}\end{displaymath} --><IMG WIDTH="311" HEIGHT="36" BORDER="0" SRC="img59.png" ALT="\begin{displaymath}C_n=\{(k_1,k_2,\ldots,k_n), k_i \in \N, \sum k_i=n \}\end{displaymath}"></DIV><BR CLEAR="ALL"><P></P>Here is the argument I used in class to explain how big <IMG WIDTH="27" HEIGHT="35" ALIGN="MIDDLE" BORDER="0" SRC="img60.png" ALT="$C_n$"> is.Each component in the vector is considered to be a box,there are <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> boxes to contain <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> balls in all,we want to contain to count the number of ways of separating the n ballsinto the <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> boxes.Put down <IMG WIDTH="48" HEIGHT="33" ALIGN="MIDDLE" BORDER="0" SRC="img61.png" ALT="$n-1$"> separators of <IMG WIDTH="10" HEIGHT="37" ALIGN="MIDDLE" BORDER="0" SRC="img62.png" ALT="$\vert$"> to make boxes, and<IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> balls, there will be <IMG WIDTH="58" HEIGHT="33" ALIGN="MIDDLE" BORDER="0" SRC="img63.png" ALT="$2n-1$"> positions fromwhich to choose the <IMG WIDTH="48" HEIGHT="33" ALIGN="MIDDLE" BORDER="0" SRC="img61.png" ALT="$n-1$"> bars' positions, for instanceour vector above corresponds to:<TT>oo||o|oo| </TT>.Thus <BR><P></P><DIV ALIGN="CENTER"><!-- MATH \begin{displaymath}|C_n|={{2n-1}\choose{n-1}}\end{displaymath} --><IMG WIDTH="131" HEIGHT="54" BORDER="0" SRC="img64.png" ALT="\begin{displaymath}\vert C_n\vert={{2n-1}\choose{n-1}}\end{displaymath}"></DIV><BR CLEAR="ALL"><P></P>Stirling's formula (<!-- MATH $n!\sim n^ne^{-n}(2\pi n)^{\frac{1}{2}}$ --><IMG WIDTH="152" HEIGHT="46" ALIGN="MIDDLE" BORDER="0" SRC="img65.png" ALT="$n!\sim n^ne^{-n}(2\pi n)^{\frac{1}{2}}$">)gives an approximation <!-- MATH $C_n \sim (n\pi)^{-\frac{1}{2}} 2^{2n-1}$ --><IMG WIDTH="153" HEIGHT="46" ALIGN="MIDDLE" BORDER="0" SRC="img66.png" ALT="$C_n \sim (n\pi)^{-\frac{1}{2}} 2^{2n-1}$">,<P>here is the function file <TT>approxcom.m</TT><PRE>function out=approxcom(n)out=round((pi*n)^(-.5)*2^(2*n-1));</PRE> thatproduces the following table of the number ofresamples:<BR><IMG WIDTH="575" HEIGHT="48" ALIGN="BOTTOM" BORDER="0" SRC="img67.png" ALT="\begin{array}{\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert}\hl......6232& 78207663 & 6.93 10^{10}& 6.35 10^{13} &5.94 10^{16}\\\hline\end{array}"><BR><P>Are all these samples equally likely, thinking about the probabilityof drawing the sample of all <IMG WIDTH="23" HEIGHT="33" ALIGN="MIDDLE" BORDER="0" SRC="img68.png" ALT="$x_1$">'s by choosing the index <IMG WIDTH="14" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img69.png" ALT="$1$"><IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> times in the integer uniform generation should persuade youthat this sample appears only once in <IMG WIDTH="25" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img70.png" ALT="$n^{n}$"> times.Whereas the sample with <IMG WIDTH="23" HEIGHT="33" ALIGN="MIDDLE" BORDER="0" SRC="img68.png" ALT="$x_1$"> once and <IMG WIDTH="23" HEIGHT="33" ALIGN="MIDDLE" BORDER="0" SRC="img71.png" ALT="$x_2$"> all the otherobservations can appear in <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> out of the <IMG WIDTH="25" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img70.png" ALT="$n^{n}$"> ways.<H3><A NAME="SECTION00252200000000000000">Which is the most likely bootstrap sample?</A></H3>The most likely resample is the original sample<!-- MATH $\mbox{${\cal X}$}=\{x_1,x_2,...,x_n\}$ --><IMG WIDTH="162" HEIGHT="37" ALIGN="MIDDLE" BORDER="0" SRC="img72.png" ALT="$\mbox{${\cal X}$}=\{x_1,x_2,...,x_n\}$">, the easiest way to see this is to consider:<H3><A NAME="SECTION00252300000000000000"></A><A NAME="def:multinomial"></A><A NAME="558"></A><BR>The <FONT COLOR="#ff0000">multinomial</FONT> distribution</H3>In fact when we are drawing bootstrap resampleswe are just drawing from the mulinomialdistribution a vector <!-- MATH $(k_1,k_2,...k_n)$ --><IMG WIDTH="105" HEIGHT="37" ALIGN="MIDDLE" BORDER="0" SRC="img73.png" ALT="$(k_1,k_2,...k_n)$">,with each of the <IMG WIDTH="16" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img28.png" ALT="$n$"> categories being equally likely,<!-- MATH $p_i=\frac{1}{n}$ --><IMG WIDTH="57" HEIGHT="40" ALIGN="MIDDLE" BORDER="0" SRC="img74.png" ALT="$p_i=\frac{1}{n}$">, so that the probability ofa possible vector is<BR><P></P><DIV ALIGN="CENTER"><!-- MATH \begin{displaymath}Prob_{boot}(k_1,k_2,...k_n)=\frac{n!}{k_1!k_2!\cdots k_n!}(\frac{1}{n})^{k_1+k_2+k_3\cdots k_n}={{n}\choose{k_1,k_2,\ldots,k_n}} n^{-n}\end{displaymath} --><IMG WIDTH="594" HEIGHT="54" BORDER="0" SRC="img75.png" ALT="\begin{displaymath}Prob_{boot}(k_1,k_2,...k_n)=\frac{n!}{k_1!k_2!\cdots k_n!}(\......_1+k_2+k_3\cdots k_n}={{n}\choose{k_1,k_2,\ldots,k_n}} n^{-n}\end{displaymath}"></DIV><BR CLEAR="ALL"><P></P>This will be largest when all the <IMG WIDTH="20" HEIGHT="35" ALIGN="MIDDLE" BORDER="0" SRC="img76.png" ALT="$k_i$">'s are <IMG WIDTH="14" HEIGHT="16" ALIGN="BOTTOM" BORDER="0" SRC="img69.png" ALT="$1$">,thus the most likely sample in the boostrap resamplingis the original sample, here is the table of the most likely values:<BR><IMG WIDTH="528" HEIGHT="48" ALIGN="BOTTOM" BORDER="0" SRC="img77.png" ALT="\begin{array}{\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert}\hl......} & 5.4\times10^{-5} & 3\times 10^{-6} &2.3\times 10^{-8}\\\hline\end{array}"><BR>As long as the statistic is somewhat a smooth function of theobservations,we can see that discreteness of the boostrap distribution is not aproblem.<HR><!--Navigation Panel--><A NAME="tex2html302" HREF="node12.html"><IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="file:/home/depot/swtree/depot/latex2html-2002-2/latex2html-2002-2/icons/next.png"></A> <A NAME="tex2html300" HREF="node6.html"><IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="file:/home/depot/swtree/depot/latex2html-2002-2/latex2html-2002-2/icons/up.png"></A> <A NAME="tex2html294" HREF="node10.html"><IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="file:/home/depot/swtree/depot/latex2html-2002-2/latex2html-2002-2/icons/prev.png"></A> <BR><B> Next:</B> <A NAME="tex2html303" HREF="node12.html">Complete Enumeration</A><B> Up:</B> <A NAME="tex2html301" HREF="node6.html">Lectures</A><B> Previous:</B> <A NAME="tex2html295" HREF="node10.html">The bootstrap: Some Examples</A><!--End of Navigation Panel--><ADDRESS>Susan Holmes2004-05-19</ADDRESS></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -