some notation.htm

来自「matlab bootstrap程序设计方法」· HTM 代码 · 共 492 行 · 第 1/2 页
HTM
492 行
&gt;&gt; var(treat)/7
ans =
  636.8299
&gt;&gt; sqrt(637)
ans =   25.2389
thetab=zeros(1,1000);
for (b =(1:1000))               
thetab(b)=median(bsample(treat));
end
hist(thetab)
&gt;&gt; sqrt(var(thetab))
ans =
   37.7768
&gt;&gt; mean(thetab)
ans =
   80.5110
</PRE>This is what the histogram looks like: <BR><IMG 
      src="Some notation.files/mouse1.gif" width=400 ,> </TD></TR></TBODY></TABLE>
<TABLE width=300>
  <TBODY>
  <TR>
    <TD><B>Control Group</B> <PRE>control=[52 104 146 10 51 30 40 27 46]';
&gt;&gt; median(control)
ans =    46
&gt;&gt; mean(control)
ans =   56.2222
&gt;&gt; var(control)
ans =   1.8042e+03
&gt;&gt; var(control)/length(control)
ans =  200.4660
&gt;&gt; sqrt(200.4660)
ans =   14.1586
thetab=zeros(1,1000);
for (b =(1:1000))               
thetab(b)=median(bsample(control));
end
hist(thetab)
&gt;&gt; sqrt(var(thetab))
ans =   11.9218
&gt;&gt; mean(thetab)
ans =   45.4370
</PRE>This is what the histogram looks like: <BR><IMG 
      src="Some notation.files/mouse2.gif" width=400 ,> </TD></TR></TBODY></TABLE>
<P>Comparing the two medians, we could use the estimates of the standard errors 
to find out if the difference between the two medians is significant? 
<P>
<H2><A name=SECTION00252000000000000000>The combinatorics of the bootstrap 
distribution</A> </H2>As we noted in class, and looking at the histograms, the 
main aspect of the bootstrap distribution of the median is that it can take on 
very few values, in the case of the treatment group for instance, <IMG height=16 
alt=$7$ src="Some notation.files/img48.png" width=14 align=bottom border=0>. The 
simple bootstrap will always present this discrete characteristic even if we 
know the underlying distribution is continuous, there are ways to fix this and 
in many cases it won't matter but it is an important feature. 
<H3><A name=SECTION00252100000000000000>How many different bootstrap samples are 
there?</A> </H3>By different samples, the samples must differ as sets, ie there 
is no difference between the sample <!-- MATH $\{x_1,x_2,\ldots,x_n\}$ --><IMG 
height=37 alt=$\{x_1,x_2,\ldots,x_n\}$ src="Some notation.files/img49.png" 
width=130 align=middle border=0> <!-- MATH $\{x_2,x_1,\ldots , x_n \}$ --><IMG 
height=37 alt="$\{x_2,x_1,\ldots , x_n \}$" src="Some notation.files/img50.png" 
width=130 align=middle border=0>, ie the observations are exchangeable or the 
statistic of interest is a symmetrical function <IMG height=16 alt=$s$ 
src="Some notation.files/img51.png" width=13 align=bottom border=0> of the 
sample: <!-- MATH $\hat{\theta}=s(\mbox{${\cal X}$})$ --><IMG height=45 
alt="$\hat{\theta}=s(\mbox{${\cal X}$})$" src="Some notation.files/img52.png" 
width=79 align=middle border=0>. <BR>Definition: <BR><A 
name=def:exchangeable></A><A name=522></A>The sequence <!-- MATH $(X_1,X_2,\ldots,X_n)$ --><IMG height=37 alt=$(X_1,X_2,\ldots,X_n)$ 
src="Some notation.files/img53.png" width=140 align=middle border=0> of random 
variables is said to be <FONT color=#ff0000>exchangeable</FONT> if the 
distribution of the <IMG height=16 alt=$n$ src="Some notation.files/img28.png" 
width=16 align=bottom border=0> vector 
<!-- MATH $(X_1,X_2,\ldots,X_n)$ --><IMG height=37 alt=$(X_1,X_2,\ldots,X_n)$ 
src="Some notation.files/img53.png" width=140 align=middle border=0> is the same 
as that of <!-- MATH $(X_{\pi(1)},X_{\pi(2)},\ldots,X_{\pi(n)})$ --><IMG 
height=37 alt=$(X_{\pi(1)},X_{\pi(2)},\ldots,X_{\pi(n)})$ 
src="Some notation.files/img54.png" width=196 align=middle border=0>, for <IMG 
height=16 alt=$\pi$ src="Some notation.files/img55.png" width=16 align=bottom 
border=0> any permutation of <IMG height=16 alt=$n$ 
src="Some notation.files/img28.png" width=16 align=bottom border=0> elements. 
<P>Suppose we condition on the sample of <IMG height=16 alt=$n$ 
src="Some notation.files/img28.png" width=16 align=bottom border=0> distinct 
observations <!-- MATH $\mbox{${\cal X}$}$ --><IMG height=16 
alt="$\mbox{${\cal X}$}$" src="Some notation.files/img56.png" width=21 
align=bottom border=0>, there are as many different samples as there are ways of 
choosing <IMG height=16 alt=$n$ src="Some notation.files/img28.png" width=16 
align=bottom border=0> objects out of a set of <IMG height=16 alt=$n$ 
src="Some notation.files/img28.png" width=16 align=bottom border=0> possible 
contenders, repetitions being allowed. 
<P>At this point it is interesting to introduce a new notation for a bootstrap 
resample, up to now we have noted a possible reasample, say <!-- MATH $\mbox{${\cal X}$}^{*b}=\{x_1,x_1,x_3,x_4,x_4\}$ --><IMG height=43 
alt="$\mbox{${\cal X}$}^{*b}=\{x_1,x_1,x_3,x_4,x_4\}$" 
src="Some notation.files/img57.png" width=203 align=middle border=0>, because of 
the exchangeability/symmetry property we can recode this as the <IMG height=16 
alt=$n$ src="Some notation.files/img28.png" width=16 align=bottom border=0> 
vector counting the number of occurrences of each of the observations. in this 
recoding we have <!-- MATH $\mbox{${\cal X}$}^{*b}=(2,0,1,2,0)$ --><IMG 
height=43 alt="$\mbox{${\cal X}$}^{*b}=(2,0,1,2,0)$" 
src="Some notation.files/img58.png" width=154 align=middle border=0> and the set 
of all bootstrap resamples is the <IMG height=16 alt=$n$ 
src="Some notation.files/img28.png" width=16 align=bottom border=0> dimensional 
<FONT color=#ff0000>simplex</FONT> <A name=def:simplex></A><A name=533></A><BR>
<P></P>
<DIV align=center><!-- MATH \begin{displaymath}C_n=\{(k_1,k_2,\ldots,k_n), k_i \in \N, \sum k_i=n \}\end{displaymath} --><IMG 
height=36 
alt="\begin{displaymath}C_n=\{(k_1,k_2,\ldots,k_n), k_i \in \N, \sum k_i=n \}\end{displaymath}" 
src="Some notation.files/img59.png" width=311 border=0> </DIV><BR clear=all>
<P></P>Here is the argument I used in class to explain how big <IMG height=35 
alt=$C_n$ src="Some notation.files/img60.png" width=27 align=middle border=0> 
is. Each component in the vector is considered to be a box, there are <IMG 
height=16 alt=$n$ src="Some notation.files/img28.png" width=16 align=bottom 
border=0> boxes to contain <IMG height=16 alt=$n$ 
src="Some notation.files/img28.png" width=16 align=bottom border=0> balls in 
all, we want to contain to count the number of ways of separating the n balls 
into the <IMG height=16 alt=$n$ src="Some notation.files/img28.png" width=16 
align=bottom border=0> boxes. Put down <IMG height=33 alt=$n-1$ 
src="Some notation.files/img61.png" width=48 align=middle border=0> separators 
of <IMG height=37 alt=$\vert$ src="Some notation.files/img62.png" width=10 
align=middle border=0> to make boxes, and <IMG height=16 alt=$n$ 
src="Some notation.files/img28.png" width=16 align=bottom border=0> balls, there 
will be <IMG height=33 alt=$2n-1$ src="Some notation.files/img63.png" width=58 
align=middle border=0> positions from which to choose the <IMG height=33 
alt=$n-1$ src="Some notation.files/img61.png" width=48 align=middle border=0> 
bars' positions, for instance our vector above corresponds to: <TT>oo||o|oo| 
</TT>. Thus <BR>
<P></P>
<DIV align=center><!-- MATH \begin{displaymath}|C_n|={{2n-1}\choose{n-1}}\end{displaymath} --><IMG 
height=54 
alt="\begin{displaymath}\vert C_n\vert={{2n-1}\choose{n-1}}\end{displaymath}" 
src="Some notation.files/img64.png" width=131 border=0> </DIV><BR clear=all>
<P></P>Stirling's formula (<!-- MATH $n!\sim n^ne^{-n}(2\pi n)^{\frac{1}{2}}$ --> <IMG height=46 
alt="$n!\sim n^ne^{-n}(2\pi n)^{\frac{1}{2}}$" 
src="Some notation.files/img65.png" width=152 align=middle border=0>) gives an 
approximation <!-- MATH $C_n \sim (n\pi)^{-\frac{1}{2}} 2^{2n-1}$ --><IMG 
height=46 alt="$C_n \sim (n\pi)^{-\frac{1}{2}} 2^{2n-1}$" 
src="Some notation.files/img66.png" width=153 align=middle border=0>, 
<P>here is the function file <TT>approxcom.m</TT> <PRE>function out=approxcom(n)
out=round((pi*n)^(-.5)*2^(2*n-1));
</PRE>that produces the following table of the number of resamples: <BR><IMG 
height=48 
alt="\begin{array}{\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert}&#10;\hl...&#10;...6232&amp; 78207663 &amp; 6.93 10^{10}&amp; 6.35 10^{13} &amp;&#10;5.94 10^{16}\\&#10;\hline&#10;\end{array}" 
src="Some notation.files/img67.png" width=575 align=bottom border=0> <BR>
<P>Are all these samples equally likely, thinking about the probability of 
drawing the sample of all <IMG height=33 alt=$x_1$ 
src="Some notation.files/img68.png" width=23 align=middle border=0>'s by 
choosing the index <IMG height=16 alt=$1$ src="Some notation.files/img69.png" 
width=14 align=bottom border=0> <IMG height=16 alt=$n$ 
src="Some notation.files/img28.png" width=16 align=bottom border=0> times in the 
integer uniform generation should persuade you that this sample appears only 
once in <IMG height=16 alt=$n^{n}$ src="Some notation.files/img70.png" width=25 
align=bottom border=0> times. Whereas the sample with <IMG height=33 alt=$x_1$ 
src="Some notation.files/img68.png" width=23 align=middle border=0> once and 
<IMG height=33 alt=$x_2$ src="Some notation.files/img71.png" width=23 
align=middle border=0> all the other observations can appear in <IMG height=16 
alt=$n$ src="Some notation.files/img28.png" width=16 align=bottom border=0> out 
of the <IMG height=16 alt=$n^{n}$ src="Some notation.files/img70.png" width=25 
align=bottom border=0> ways. 
<H3><A name=SECTION00252200000000000000>Which is the most likely bootstrap 
sample?</A> </H3>The most likely resample is the original sample <!-- MATH $\mbox{${\cal X}$}=\{x_1,x_2,...,x_n\}$ --><IMG height=37 
alt="$\mbox{${\cal X}$}=\{x_1,x_2,...,x_n\}$" 
src="Some notation.files/img72.png" width=162 align=middle border=0>, the 
easiest way to see this is to consider: 
<H3><A name=SECTION00252300000000000000></A><A name=def:multinomial></A><A 
name=558></A><BR>The <FONT color=#ff0000>multinomial</FONT> distribution </H3>In 
fact when we are drawing bootstrap resamples we are just drawing from the 
mulinomial distribution a vector <!-- MATH $(k_1,k_2,...k_n)$ --><IMG 
height=37 alt=$(k_1,k_2,...k_n)$ src="Some notation.files/img73.png" width=105 
align=middle border=0>, with each of the <IMG height=16 alt=$n$ 
src="Some notation.files/img28.png" width=16 align=bottom border=0> categories 
being equally likely, <!-- MATH $p_i=\frac{1}{n}$ --><IMG height=40 
alt=$p_i=\frac{1}{n}$ src="Some notation.files/img74.png" width=57 align=middle 
border=0>, so that the probability of a possible vector is <BR>
<P></P>
<DIV align=center><!-- MATH \begin{displaymath}Prob_{boot}(k_1,k_2,...k_n)=\frac{n!}{k_1!k_2!\cdots k_n!}(\frac{1}{n})^{k_1+k_2+k_3\cdots k_n}={{n}\choose{k_1,k_2,\ldots,k_n}} n^{-n}\end{displaymath} --><IMG 
height=54 
alt="\begin{displaymath}Prob_{boot}(k_1,k_2,...k_n)=\frac{n!}{k_1!k_2!\cdots k_n!}&#10;(\...&#10;..._1+k_2+k_3\cdots k_n}=&#10;{{n}\choose{k_1,k_2,\ldots,k_n}} n^{-n}&#10;\end{displaymath}" 
src="Some notation.files/img75.png" width=594 border=0> </DIV><BR clear=all>
<P></P>This will be largest when all the <IMG height=35 alt=$k_i$ 
src="Some notation.files/img76.png" width=20 align=middle border=0>'s are <IMG 
height=16 alt=$1$ src="Some notation.files/img69.png" width=14 align=bottom 
border=0>, thus the most likely sample in the boostrap resampling is the 
original sample, here is the table of the most likely values: <BR><IMG height=48 
alt="\begin{array}{\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert l\vert}&#10;\hl...&#10;...} &amp; 5.4\times10^{-5} &amp; 3\times 10^{-6} &amp;&#10;2.3\times 10^{-8}\\&#10;\hline&#10;\end{array}" 
src="Some notation.files/img77.png" width=528 align=bottom border=0> <BR>As long 
as the statistic is somewhat a smooth function of the observations, we can see 
that discreteness of the boostrap distribution is not a problem. 
<HR>
<!--Navigation Panel--><A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node12.html" 
name=tex2html302><IMG height=24 alt=next src="Some notation.files/next.png" 
width=37 align=bottom border=0></A> <A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node6.html" 
name=tex2html300><IMG height=24 alt=up src="Some notation.files/up.png" width=26 
align=bottom border=0></A> <A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node10.html" 
name=tex2html294><IMG height=24 alt=previous src="Some notation.files/prev.png" 
width=63 align=bottom border=0></A> <BR><B>Next:</B> <A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node12.html" 
name=tex2html303>Complete Enumeration</A> <B>Up:</B> <A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node6.html" 
name=tex2html301>Lectures</A> <B>Previous:</B> <A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node10.html" 
name=tex2html295>The bootstrap: Some Examples</A> <!--End of Navigation Panel-->
<ADDRESS>Susan Holmes 2004-05-19 </ADDRESS></BODY></HTML>
some notation.htm - 源码说明

本页面展示了「matlab bootstrap程序设计方法」中的 some notation.htm 源码文件，采用 HTM 编程语言编写，共 492 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与bootstrap相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?