more about the theoretical underpinnings of the bootstrap.htm

来自「matlab bootstrap程序设计方法」· HTM 代码 · 共 671 行 · 第 1/3 页
HTM
671 行
width=434 border=0> </DIV><BR clear=all>
<P></P>So that the sampling distribution of the maximum tends to be exponential, 
and depends on the unknown parameter. 
<P>I backed this up with a simulation experiment, I generated radnom uniforms 
from (0,1) and then multiplied them by 5 to simulate a U(0,5) distribution, I 
then computed many samples of size 15 from this distribution and took the 
maximums. <PRE>&gt;&gt; rand(3,10)                                       
ans =
  Columns 1 through 6
0.5804  0.8833  0.3863  0.2362  0.2711  0.6603   
0.7468  0.1463  0.9358  0.9170  0.8861  0.1625   
0.0295  0.8030  0.7004  0.2994  0.4198  0.0537   
  Columns 7 through 10 
 0.3936   0.6143    0.8899    0.2002
 0.4927   0.4924    0.0578    0.3766
 0.2206   0.8396    0.2538    0.0489
&gt;&gt; [v,i]=max(test1)
v =
Columns 1 through 7 
0.7468 0.8833 0.9358 0.9170 0.8861 0.6603 0.4927
Columns 8 through 10 
 0.8396    0.8899   0.3766
i =
  2   1    2    2     2     1   2   3   1   2

----------------------------------------------
function v=smaxi(B,n,maxi)
%Simulation of the uniform distribution
%on (0,maxi) with estimation of the maximum
%Samples of size n, B simulations
rands=maxi*rand(n,B);
[v,i]=max(rands);
--------------------------------------------

mm=smaxi(10000,15,5);
hist(mm,40)
</PRE>
<P>This is what the histogram looks like: <BR><IMG 
src="More about the theoretical underpinnings of the Bootstrap.files/pmax2.gif" 
width=400 ,> 
<P>As can be seen although the sample size was not very big, the sampling 
distribution is already quite close to exponential. 
<P>In the boosttrap example, we satrt with a given sample, sample1, that we will 
resample 10,000 times computing 10,000 maxima. We do not use loops but rather 
matrices that are more efficient both in Splus and matlab. 
<TABLE width=311>
  <TBODY>
  <TR>
    <TD><B>Non parametric Bootstrap</B> <PRE>sample1=(5*rand(15,1));
  Columns 1 through 7 
1.1892 3.0412 4.4574 4.6107 4.6531 1.6721 3.9533
  Columns 8 through 14 
2.4167 2.7374 1.9671 2.7069 1.6799 2.4536 4.0456
  Column 15 
3.2760
&gt;&gt; [v,i]=max(sample1)
v =    4.6531
i =     5
&gt;&gt; indices=randint(n,B,n)+1 
indices =
   13    1   13    1    7   5     9   10    9    7
    2    6    1    6    8   3     1    3    2    4
   13   11    1    1    6   1    14   15    1    7
    6   10    1    5    9   7     4    1    5    5
   10   12    3    2    7   4    15   11   15   13
    2    2    3   14   12  10     8    4    9    9
   10   14    3    1    6  15     9   10    6    9
    2    6    6    9    1  11     3   13   12    8
    1    2    9    6   13   4     7    1   14   15
    9    9   14    6    1   9    14    3   15    3
    5   13    5    6   11  11    15    4    4   13
   10    8   12   14   14   7     7    4    9    7
    2    2    4   11    5   1     4    2    3    4
   13    5   10    7    8   4    13    3    9    1
   15    7    2    1    7   9    11   12    6    6
[out,i]= max(orig(indices))
  Columns 1 through 7 
4.6531    4.6531    4.6531  4.6531  4.6531  4.6531  4.6107
  Columns 8 through 10 
4.6107    4.6531    4.6531
function out=bmax(B,orig)
%Function to bootstrap the maximum
     [n,p]=size(orig);
     indices=randint(n,B,n)+1;
     [out,i]= max(orig(indices));
</PRE>
      <P>This is what the histogram looks like: <BR><IMG 
      src="More about the theoretical underpinnings of the Bootstrap.files/nmax.gif" 
      width=400 ,> <BR>This shows a definite point mass at the sample maximum. 
      In fact we can prove that the sample maximum has a point mass that stays 
      large, whatever the sample size, because: <BR>
      <P></P>
      <DIV align=center><!-- MATH \begin{displaymath}1-P(X_{(n)}^*=X_{(n)})=1-P((X_{(n)} \in\mbox{${\cal X}$}_n^*=1-(1-\frac{1}{n})^n\simeq 1-e^{-1}\end{displaymath} --><IMG 
      height=45 
      alt="\begin{displaymath}1-P(X_{(n)}^*=X_{(n)})=1-P((X_{(n)} \in&#10;\mbox{${\cal X}$}_n^*=1-(1-\frac{1}{n})^n\simeq 1-e^{-1}\end{displaymath}" 
      src="More about the theoretical underpinnings of the Bootstrap.files/img210.png" 
      width=540 border=0> </DIV><BR clear=all>
      <P></P></TD></TR></TBODY></TABLE>There are several ways to fix this, Jo Romano 
has suggested a <IMG height=35 alt=$k-out-of-n$ 
src="More about the theoretical underpinnings of the Bootstrap.files/img211.png" 
width=143 align=middle border=0> bootstrap and we could also use the extra 
information contained in the fact that we supposed that we knew what form the 
distribution was originally, ie Uniform, although the parameter is supposed 
unknown. This is also a plug in method, but called the parametric bootstrap. 
<H2><A name=SECTION00293000000000000000>Parametric Bootstrap</A> </H2>
<P>Knowing that the distribution function <IMG height=16 alt=$F$ 
src="More about the theoretical underpinnings of the Bootstrap.files/img1.png" 
width=19 align=bottom border=0> is restricted to a certain parametric family can 
help alot. 
<P>
<H3><A name=SECTION00293100000000000000>Maximum</A> </H3>
<P>Suppose that we want to do the maximum, still knowing that in fact the sample 
was drawn from a uniform with unkown upper bound <IMG height=17 alt=$\theta$ 
src="More about the theoretical underpinnings of the Bootstrap.files/img10.png" 
width=14 align=bottom border=0>. We would be better off to generate many samples 
from the Uniform(0,<IMG height=22 alt=$\hat{\theta}$ 
src="More about the theoretical underpinnings of the Bootstrap.files/img212.png" 
width=14 align=bottom border=0>), and look at the distribution of the maximum, 
there will of course be a slight bias to the left, but the distribution will be 
of the right form. 
<H3><A name=SECTION00293200000000000000>Correlation Coefficient</A> </H3>Suppose 
that in the law school data, the random variables <IMG height=33 alt=$y,z$ 
src="More about the theoretical underpinnings of the Bootstrap.files/img213.png" 
width=32 align=middle border=0> are known to be Normal, with some unknown 
covariance structure, with correlation coefficient <IMG height=33 alt=$\rho$ 
src="More about the theoretical underpinnings of the Bootstrap.files/img214.png" 
width=14 align=middle border=0>. 
<P>Instead of new data by resampling we can generate new data by generating 
samples from the bivariate Normal, however we will have to plug in an estimate 
for the variance/covariance structure obtained from the original data. 
<P><B>Parametric Simulation</B> <PRE>&gt;&gt; c=sqrt((1-.776^2)/.776^2)
c =    0.8128
function [ys,zs]=gennorm(B,my,mz,sy,sz,c)
%Simulation of the normal distribution
%with sy2,sz2,rho the variances and correlation
%C=sqrt((1-rho^2)/rho^2)
%and (my,mz) as the means
%B simulations
rs=randn(B,2);
r1=rs(:,1);
r2=rs(:,2);
ys=my+sy*r1;
zs=mz+(sz/(1+c^2))*(r1+c*r2);
&gt;&gt; [ys,zs]=gennorm(1000,my,mz,sy,sz,c);
&gt;&gt; corrcoef(ys,zs)
ans =
    1.0000    0.7558
    0.7558    1.0000
</PRE>
<P>This is what the points looks like: <BR><IMG 
src="More about the theoretical underpinnings of the Bootstrap.files/normscat.gif" 
width=500 ,> <BR>
<P><B>Parametric Bootstrap Simulations</B> <PRE> y=law15(:,1)
 z=law15(:,2)
&gt;&gt; mz=mean(z)
mz =
    3.0947
&gt;&gt; my=mean(y)
my =
  600.2667
&gt;&gt; var(z)
ans =    0.0593
&gt;&gt; var(y)
ans =   1.7468e+03
&gt;&gt; corrcoef(y,z)
ans =
    1.0000    0.7764
    0.7764    1.0000
&gt;&gt; cov(y,z)
ans =
   1.0e+03 *
    1.7468    0.0079
    0.0079    0.0001
&gt;&gt; sy=sqrt(var(y))
sy =   41.7945
&gt;&gt; sz=sqrt(var(z))
sz =
    0.2435
corrs=zeros(1,1000); 
for b=(1:1000)
[ys,zs]=gennorm(15,my,mz,sy,sz,c);
cor=corrcoef(ys,zs);    
corrs(b)=cor(1,2);
end
&gt;&gt; hist(corrs,40)
</PRE>
<P>This is what the histogram looks like: <BR><IMG 
src="More about the theoretical underpinnings of the Bootstrap.files/pcorr.gif" 
width=500 ,> <BR>
<P>Here, you can compare to the 'true' sampling distribution for the law school 
data as obtained by sampling without replacement 100,000 times from the original 
82 observation population. 
<P><IMG 
src="More about the theoretical underpinnings of the Bootstrap.files/true82.gif" 
width=500 ,> 
<P>
<HR>
<!--Navigation Panel--><A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node16.html" 
name=tex2html355><IMG height=24 alt=next 
src="More about the theoretical underpinnings of the Bootstrap.files/next.png" 
width=37 align=bottom border=0></A> <A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node6.html" 
name=tex2html353><IMG height=24 alt=up 
src="More about the theoretical underpinnings of the Bootstrap.files/up.png" 
width=26 align=bottom border=0></A> <A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node14.html" 
name=tex2html347><IMG height=24 alt=previous 
src="More about the theoretical underpinnings of the Bootstrap.files/prev.png" 
width=63 align=bottom border=0></A> <BR><B>Next:</B> <A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node16.html" 
name=tex2html356>The jackknife</A> <B>Up:</B> <A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node6.html" 
name=tex2html354>Lectures</A> <B>Previous:</B> <A 
href="http://www-stat.stanford.edu/~susan/courses/s208/node14.html" 
name=tex2html348>Monte Carlo</A> <!--End of Navigation Panel-->
<ADDRESS>Susan Holmes 2004-05-19 </ADDRESS></BODY></HTML>
more about the theoretical underpinnings of the bootstrap.htm - 源码说明

本页面展示了「matlab bootstrap程序设计方法」中的 more about the theoretical underpinnings of the bootstrap.htm 源码文件，采用 HTM 编程语言编写，共 671 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与bootstrap相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?