some notation.htm
来自「matlab bootstrap程序设计方法」· HTM 代码 · 共 492 行 · 第 1/2 页
HTM
492 行
>> var(treat)/7
ans =
636.8299
>> sqrt(637)
ans = 25.2389
thetab=zeros(1,1000);
for (b =(1:1000))
thetab(b)=median(bsample(treat));
end
hist(thetab)
>> sqrt(var(thetab))
ans =
37.7768
>> mean(thetab)
ans =
80.5110
</PRE>This is what the histogram looks like: <BR><IMG
src="Some notation.files/mouse1.gif" width=400 ,> </TD></TR></TBODY></TABLE>
<TABLE width=300>
<TBODY>
<TR>
<TD><B>Control Group</B> <PRE>control=[52 104 146 10 51 30 40 27 46]';
>> median(control)
ans = 46
>> mean(control)
ans = 56.2222
>> var(control)
ans = 1.8042e+03
>> var(control)/length(control)
ans = 200.4660
>> sqrt(200.4660)
ans = 14.1586
thetab=zeros(1,1000);
for (b =(1:1000))
thetab(b)=median(bsample(control));
end
hist(thetab)
>> sqrt(var(thetab))
ans = 11.9218
>> mean(thetab)
ans = 45.4370
</PRE>This is what the histogram looks like: <BR><IMG
src="Some notation.files/mouse2.gif" width=400 ,> </TD></TR></TBODY></TABLE>
<P>Comparing the two medians, we could use the estimates of the standard errors
to find out if the difference between the two medians is significant?
<P>
<H2><A name=SECTION00252000000000000000>The combinatorics of the bootstrap
distribution</A> </H2>As we noted in class, and looking at the histograms, the
main aspect of the bootstrap distribution of the median is that it can take on
very few values, in the case of the treatment group for instance, <IMG height=16
alt=$7$ src="Some notation.files/img48.png" width=14 align=bottom border=0>. The
simple bootstrap will always present this discrete characteristic even if we
know the underlying distribution is continuous, there are ways to fix this and
in many cases it won't matter but it is an important feature.
<H3><A name=SECTION00252100000000000000>How many different bootstrap samples are
there?</A> </H3>By different samples, the samples must differ as sets, ie there
is no difference between the sample <!-- MATH $\{x_1,x_2,\ldots,x_n\}$ --><IMG
height=37 alt=$\{x_1,x_2,\ldots,x_n\}$ src="Some notation.files/img49.png"
width=130 align=middle border=0> <!-- MATH $\{x_2,x_1,\ldots , x_n \}$ --><IMG
height=37 alt="$\{x_2,x_1,\ldots , x_n \}$" src="Some notation.files/img50.png"
width=130 align=middle border=0>, ie the observations are exchangeable or the
statistic of interest is a symmetrical function <IMG height=16 alt=$s$
src="Some notation.files/img51.png" width=13 align=bottom border=0> of the
sample: <!-- MATH $\hat{\theta}=s(\mbox{${\cal X}$})$ --><IMG height=45
alt="$\hat{\theta}=s(\mbox{${\cal X}$})$" src="Some notation.files/img52.png"
width=79 align=middle border=0>. <BR>Definition: <BR><A
name=def:exchangeable></A><A name=522></A>The sequence <!-- MATH $(X_1,X_2,\ldots,X_n)$ --><IMG height=37 alt=$(X_1,X_2,\ldots,X_n)$
src="Some notation.files/img53.png" width=140 align=middle border=0> of random
variables is said to be <FONT color=#ff0000>exchangeable</FONT> if the
distribution of the <IMG height=16 alt=$n$ src="Some notation.files/img28.png"
width=16 align=bottom border=0> vector
<!-- MATH $(X_1,X_2,\ldots,X_n)$ --><IMG height=37 alt=$(X_1,X_2,\ldots,X_n)$
src="Some notation.files/img53.png" width=140 align=middle border=0> is the same
as that of <!-- MATH $(X_{\pi(1)},X_{\pi(2)},\ldots,X_{\pi(n)})$ --><IMG
height=37 alt=$(X_{\pi(1)},X_{\pi(2)},\ldots,X_{\pi(n)})$
src="Some notation.files/img54.png" width=196 align=middle border=0>, for <IMG
height=16 alt=$\pi$ src="Some notation.files/img55.png" width=16 align=bottom
border=0> any permutation of <IMG height=16 alt=$n$
src="Some notation.files/img28.png" width=16 align=bottom border=0> elements.
<P>Suppose we condition on the sample of <IMG height=16 alt=$n$
src="Some notation.files/img28.png" width=16 align=bottom border=0> distinct
observations <!-- MATH $\mbox{${\cal X}$}$ --><IMG height=16
alt="$\mbox{${\cal X}$}$" src="Some notation.files/img56.png" width=21
align=bottom border=0>, there are as many different samples as there are ways of
choosing <IMG height=16 alt=$n$ src="Some notation.files/img28.png" width=16
align=bottom border=0> objects out of a set of <IMG height=16 alt=$n$
src="Some notation.files/img28.png" width=16 align=bottom border=0> possible
contenders, repetitions being allowed.
<P>At this point it is interesting to introduce a new notation for a bootstrap
resample, up to now we have noted a possible reasample, say <!-- MATH $\mbox{${\cal X}$}^{*b}=\{x_1,x_1,x_3,x_4,x_4\}$ --><IMG height=43
alt="$\mbox{${\cal X}$}^{*b}=\{x_1,x_1,x_3,x_4,x_4\}$"
src="Some notation.files/img57.png" width=203 align=middle border=0>, because of
the exchangeability/symmetry property we can recode this as the <IMG height=16
alt=$n$ src="Some notation.files/img28.png" width=16 align=bottom border=0>
vector counting the number of occurrences of each of the observations. in this
recoding we have <!-- MATH $\mbox{${\cal X}$}^{*b}=(2,0,1,2,0)$ --><IMG
height=43 alt="$\mbox{${\cal X}$}^{*b}=(2,0,1,2,0)$"
src="Some notation.files/img58.png" width=154 align=middle border=0> and the set
of all bootstrap resamples is the <IMG height=16 alt=$n$
src="Some notation.files/img28.png" width=16 align=bottom border=0> dimensional
<FONT color=#ff0000>simplex</FONT> <A name=def:simplex></A><A name=533></A><BR>
<P></P>
<DIV align=center><!-- MATH \begin{displaymath}C_n=\{(k_1,k_2,\ldots,k_n), k_i \in \N, \sum k_i=n \}\end{displaymath} --><IMG
height=36
alt="\begin{displaymath}C_n=\{(k_1,k_2,\ldots,k_n), k_i \in \N, \sum k_i=n \}\end{displaymath}"
src="Some notation.files/img59.png" width=311 border=0> </DIV><BR clear=all>
<P></P>Here is the argument I used in class to explain how big <IMG height=35
alt=$C_n$ src="Some notation.files/img60.png" width=27 align=middle border=0>
is. Each component in the vector is considered to be a box, there are <IMG
height=16 alt=$n$ src="Some notation.files/img28.png" width=16 align=bottom
border=0> boxes to contain <IMG height=16 alt=$n$
src="Some notation.files/img28.png" width=16 align=bottom border=0> balls in
all, we want to contain to count the number of ways of separating the n balls
into the <IMG height=16 alt=$n$ src="Some notation.files/img28.png" width=16
align=bottom border=0> boxes. Put down <IMG height=33 alt=$n-1$
src="Some notation.files/img61.png" width=48 align=middle border=0> separators
of <IMG height=37 alt=$\vert$ src="Some notation.files/img62.png" width=10
align=middle border=0> to make boxes, and <IMG height=16 alt=$n$
src="Some notation.files/img28.png" width=16 align=bottom border=0> balls, there
will be <IMG height=33 alt=$2n-1$ src="Some notation.files/img63.png" width=58
align=middle border=0> positions from which to choose the <IMG height=33
alt=$n-1$ src="Some notation.files/img61.png" width=48 align=middle border=0>
bars' positions, for instance our vector above corresponds to: <TT>oo||o|oo|
</TT>. Thus <BR>
<P></P>
<DIV align=center><!-- MATH \begin{displaymath}|C_n|={{2n-1}\choose{n-1}}\end{displaymath} --><IMG
height=54
alt="\begin{displaymath}\vert C_n\vert={{2n-1}\choose{n-1}}\end{displaymath}"
src="Some notation.files/img64.png" width=131 border=0> </DIV><BR clear=all>
<P></P>Stirling's formula (<!-- MATH $n!\sim n^ne^{-n}(2\pi n)^{\frac{1}{2}}$ --> <IMG height=46
alt="$n!\sim n^ne^{-n}(2\pi n)^{\frac{1}{2}}$"
src="Some notation.files/img65.png" width=152 align=middle border=0>) gives an
approximation <!-- MATH $C_n \sim (n\pi)^{-\frac{1}{2}} 2^{2n-1}$ --><IMG
height=46 alt="$C_n \sim (n\pi)^{-\frac{1}{2}} 2^{2n-1}$"
src="Some notation.files/img66.png" width=153 align=middle border=0>,
<P>here is the function file <TT>approxcom.m</TT> <PRE>function out=approxcom(n)
out=round((pi*n)^(-.5)*2^(2*n-1));
</PRE>that produces the following table of the number of resamples: <BR><IMG
height=48
alt="\begin{array}{\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert} \hl... ...6232& 78207663 & 6.93 10^{10}& 6.35 10^{13} & 5.94 10^{16}\\ \hline \end{array}"
src="Some notation.files/img67.png" width=575 align=bottom border=0> <BR>
<P>Are all these samples equally likely, thinking about the probability of
drawing the sample of all <IMG height=33 alt=$x_1$
src="Some notation.files/img68.png" width=23 align=middle border=0>'s by
choosing the index <IMG height=16 alt=$1$ src="Some notation.files/img69.png"
width=14 align=bottom border=0> <IMG height=16 alt=$n$
src="Some notation.files/img28.png" width=16 align=bottom border=0> times in the
integer uniform generation should persuade you that this sample appears only
once in <IMG height=16 alt=$n^{n}$ src="Some notation.files/img70.png" width=25
align=bottom border=0> times. Whereas the sample with <IMG height=33 alt=$x_1$
src="Some notation.files/img68.png" width=23 align=middle border=0> once and
<IMG height=33 alt=$x_2$ src="Some notation.files/img71.png" width=23
align=middle border=0> all the other observations can appear in <IMG height=16
alt=$n$ src="Some notation.files/img28.png" width=16 align=bottom border=0> out
of the <IMG height=16 alt=$n^{n}$ src="Some notation.files/img70.png" width=25
align=bottom border=0> ways.
<H3><A name=SECTION00252200000000000000>Which is the most likely bootstrap
sample?</A> </H3>The most likely resample is the original sample <!-- MATH $\mbox{${\cal X}$}=\{x_1,x_2,...,x_n\}$ --><IMG height=37
alt="$\mbox{${\cal X}$}=\{x_1,x_2,...,x_n\}$"
src="Some notation.files/img72.png" width=162 align=middle border=0>, the
easiest way to see this is to consider:
<H3><A name=SECTION00252300000000000000></A><A name=def:multinomial></A><A
name=558></A><BR>The <FONT color=#ff0000>multinomial</FONT> distribution </H3>In
fact when we are drawing bootstrap resamples we are just drawing from the
mulinomial distribution a vector <!-- MATH $(k_1,k_2,...k_n)$ --><IMG
height=37 alt=$(k_1,k_2,...k_n)$ src="Some notation.files/img73.png" width=105
align=middle border=0>, with each of the <IMG height=16 alt=$n$
src="Some notation.files/img28.png" width=16 align=bottom border=0> categories
being equally likely, <!-- MATH $p_i=\frac{1}{n}$ --><IMG height=40
alt=$p_i=\frac{1}{n}$ src="Some notation.files/img74.png" width=57 align=middle
border=0>, so that the probability of a possible vector is <BR>
<P></P>
<DIV align=center><!-- MATH \begin{displaymath}Prob_{boot}(k_1,k_2,...k_n)=\frac{n!}{k_1!k_2!\cdots k_n!}(\frac{1}{n})^{k_1+k_2+k_3\cdots k_n}={{n}\choose{k_1,k_2,\ldots,k_n}} n^{-n}\end{displaymath} --><IMG
height=54
alt="\begin{displaymath}Prob_{boot}(k_1,k_2,...k_n)=\frac{n!}{k_1!k_2!\cdots k_n!} (\... ..._1+k_2+k_3\cdots k_n}= {{n}\choose{k_1,k_2,\ldots,k_n}} n^{-n} \end{displaymath}"
src="Some notation.files/img75.png" width=594 border=0> </DIV><BR clear=all>
<P></P>This will be largest when all the <IMG height=35 alt=$k_i$
src="Some notation.files/img76.png" width=20 align=middle border=0>'s are <IMG
height=16 alt=$1$ src="Some notation.files/img69.png" width=14 align=bottom
border=0>, thus the most likely sample in the boostrap resampling is the
original sample, here is the table of the most likely values: <BR><IMG height=48
alt="\begin{array}{\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert} \hl... ...} & 5.4\times10^{-5} & 3\times 10^{-6} & 2.3\times 10^{-8}\\ \hline \end{array}"
src="Some notation.files/img77.png" width=528 align=bottom border=0> <BR>As long
as the statistic is somewhat a smooth function of the observations, we can see
that discreteness of the boostrap distribution is not a problem.
<HR>
<!--Navigation Panel--><A
href="http://www-stat.stanford.edu/~susan/courses/s208/node12.html"
name=tex2html302><IMG height=24 alt=next src="Some notation.files/next.png"
width=37 align=bottom border=0></A> <A
href="http://www-stat.stanford.edu/~susan/courses/s208/node6.html"
name=tex2html300><IMG height=24 alt=up src="Some notation.files/up.png" width=26
align=bottom border=0></A> <A
href="http://www-stat.stanford.edu/~susan/courses/s208/node10.html"
name=tex2html294><IMG height=24 alt=previous src="Some notation.files/prev.png"
width=63 align=bottom border=0></A> <BR><B>Next:</B> <A
href="http://www-stat.stanford.edu/~susan/courses/s208/node12.html"
name=tex2html303>Complete Enumeration</A> <B>Up:</B> <A
href="http://www-stat.stanford.edu/~susan/courses/s208/node6.html"
name=tex2html301>Lectures</A> <B>Previous:</B> <A
href="http://www-stat.stanford.edu/~susan/courses/s208/node10.html"
name=tex2html295>The bootstrap: Some Examples</A> <!--End of Navigation Panel-->
<ADDRESS>Susan Holmes 2004-05-19 </ADDRESS></BODY></HTML>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?