bootstrapping a principal component analysis.htm
来自「matlab bootstrap程序设计方法」· HTM 代码 · 共 509 行 · 第 1/2 页
HTM
509 行
\begin{displaymath}x^{(k)}_{ij} = \sum^k_{t=1}u_{it}s_t v_{jt}\end{displaymath} --><IMG
height=58
alt="\begin{displaymath} x^{(k)}_{ij} = \sum^k_{t=1}u_{it}s_t v_{jt} \end{displaymath}"
src="Bootstrapping a Principal Component Analysis.files/img250.png" width=135
border=0> </DIV><BR clear=all>
<P></P>is the best (optimal) approximation of rank <IMG height=17 alt=$k$
src="Bootstrapping a Principal Component Analysis.files/img251.png" width=15
align=bottom border=0> for <IMG height=16 alt=$X$
src="Bootstrapping a Principal Component Analysis.files/img162.png" width=22
align=bottom border=0>.
<P>The columns of <IMG height=16 alt=$V$
src="Bootstrapping a Principal Component Analysis.files/img252.png" width=20
align=bottom border=0> are the directions along which the variances are maximal.
Definition: <BR>Principal components are the coordinates of the observations on
the basis of the new variables (namely the columns of <IMG height=16 alt=$V$
src="Bootstrapping a Principal Component Analysis.files/img252.png" width=20
align=bottom border=0>) and they are the rows of <IMG height=16 alt=$G=XV=US$
src="Bootstrapping a Principal Component Analysis.files/img253.png" width=129
align=bottom border=0>. The components are orthogonal and their lengths are the
singular values <!-- MATH $C^\prime C=SU^\prime US=S^2$ --><IMG height=19
alt="$C^\prime C=SU^\prime US=S^2$"
src="Bootstrapping a Principal Component Analysis.files/img254.png" width=168
align=bottom border=0>.
<P></P>In the same way the principal axes are defined as the rows of the matrix <!-- MATH $Z=U^\prime X=SV^\prime$ --><IMG height=17
alt="$Z=U^\prime X=SV^\prime$"
src="Bootstrapping a Principal Component Analysis.files/img243.png" width=137
align=bottom border=0>. Coordinates of the variables on the basis of columns of
<IMG height=16 alt=$U$
src="Bootstrapping a Principal Component Analysis.files/img255.png" width=19
align=bottom border=0>. <BR>
<P></P>
<DIV align=center><!-- MATH \begin{displaymath}Z=S^{-1}C^\prime X \quad\mbox{and}\quad G=XZ^\prime S^{-1}.\end{displaymath} --><IMG
height=28
alt="\begin{displaymath} Z=S^{-1}C^\prime X \quad\mbox{and}\quad G=XZ^\prime S^{-1}. \end{displaymath}"
src="Bootstrapping a Principal Component Analysis.files/img256.png" width=283
border=0> </DIV><BR clear=all>
<P></P>However, this decomposition will be highly dependent upon the unity of
measurement (scale) on which the variables are measured. It is only used, in
fact, when the <IMG height=35 alt="$X_{\cdot k}$"
src="Bootstrapping a Principal Component Analysis.files/img257.png" width=32
align=middle border=0> are all of the same ``order''. Usually what is done is
that a weight is assigned to each variable that takes into account its overall
scale. This weight is very often inversely proportional to the variable's
variance.
<P>So we define a different metric between observations instead of
<P></P>
<DIV align=center><!-- MATH \begin{eqnarray*}d\,\left( x_{i\cdot}, x_{j\cdot}\right) &=& \left( x_{i\cdot}- x_{j\cdot}\right)^\prime, \left( x_{i\cdot}- x_{j\cdot}\right)\\d\,\left( x_{i\cdot}, x_{j\cdot}\right) &=& \left( x_{i\cdot}- x_{j\cdot}\right), Q\, \left( x_{i\cdot}- x_{j\cdot}\right)\\Q &=& \left(\begin{array}{ccc}\frac{1}{\sigma^2_1}\\& \ddots\\& & \frac{1}{\sigma^2_p}\end{array}\right).\end{eqnarray*} --><IMG
height=154
alt="\begin{eqnarray*} d \left( x_{i\cdot}, x_{j\cdot}\right) &=& \left( x_{i\cdot}-... ...2_1}\\ & \ddots\\ & & \frac{1}{\sigma^2_p} \end{array}\right). \end{eqnarray*}"
src="Bootstrapping a Principal Component Analysis.files/img258.png" width=314
border=0></DIV><BR clear=all>
<P></P>The same can be said of the observations; some may be more ``important''
than others (resulting from a mean of a group which is larger).
<H3><A name=SECTION002121100000000000000>Matlab for the Scores Example -in
handout 4/27/99</A> </H3><B>Score data</B> <PRE>>> round(cov(scor88))
ans =
306 127 102 106 117
127 173 85 95 99
102 85 113 112 122
106 95 112 220 156
117 99 122 156 298
>> corrcoef(scor88)
ans =
1.0000 0.5534 0.5468 0.4094 0.3891
0.5534 1.0000 0.6096 0.4851 0.4364
0.5468 0.6096 1.0000 0.7108 0.6647
0.4094 0.4851 0.7108 1.0000 0.6072
0.3891 0.4364 0.6647 0.6072 1.0000
>> mean(scor88)
ans = 38.95 50.59 50.60 46.68 42.3068
>> scorc88=scor88-ones(88,1)*mean(scor88);
>> mean(scorc88)
ans =1.0e-13 *
0.1421 -0.1970 -0.0565 -0.0436 -0.0363
>> [U,S,V] = svd(scorc88,0);
>> S
244.4752 0 0 0 0
0 132.6034 0 0 0
0 0 95.0053 0 0
0 0 0 85.8070 0
0 0 0 0 52.8898
>> V
V = 0.5054 -0.7487 -0.2998 0.2962 -0.0794
0.3683 -0.2074 0.4156 -0.7829 -0.1889
0.3457 0.0759 0.1453 -0.0032 0.9239
0.4511 0.3009 0.5966 0.5181 -0.2855
0.5347 0.5478 -0.6003 -0.1757 -0.1512
>> [VE, LE]=eig(cov(scorc88))
VE =
0.2962 -0.2998 0.0794 -0.7487 0.5054
-0.7829 0.4156 0.1889 -0.2074 0.3683
-0.0032 0.1453 -0.9239 0.0759 0.3457
0.5181 0.5966 0.2855 0.3009 0.4511
-0.1757 -0.6003 0.1512 0.5478 0.5347
LE =
84.6304 0 0 0 0
0 103.7473 0 0 0
0 0 32.1533 0 0
0 0 0 202.1111 0
0 0 0 0 686.9898
</PRE>Nonparametric Bootstrap of the ratio of the variance of the first few(k)
axes. <PRE>function out=bpca(B,orig,k)
%Bootstrap of the PCA of a dataset
%Output the proprtion
%of variance for the first k axes
[n,p]=size(orig);
for (b =(1:B))
newsample=bsample(orig);
out(b)=pca(newsample,n,p,k);
end %---------------------------
function out=bsample(orig)
%Function to create one resample from
%the original sample orig, where
%orig is the original data, and is a matrix
[n,p]=size(orig);
indices=randint(1,n,n)+1;
out=orig(indices,:);
%-------------------------------
function out=pca(data,n,p,k)
%Do a principal component analysis
%Return the ratio of the variance
%explained by the first k components
data=data-ones(n,1)*mean(data);
[U,S,V] = svd(data,0);
S2=diag(S).^2;
out=sum(S2(1:k))/sum(S2(1:p));
%---------------------------------
>> resbpca10K1=bpca(10000,scor88,1);
>> resbpca10K=bpca(10000,scor88,2);
>> hist(resbpca10K,100)
</PRE>one component <BR><IMG
src="Bootstrapping a Principal Component Analysis.files/btspca1.gif" width=400
,> two components <BR><IMG
src="Bootstrapping a Principal Component Analysis.files/btspca2.gif" width=400
,>
<P>
<H3><A name=SECTION002121200000000000000>How to generate a multivariate
normal</A> </H3>Choleski decomposition of a matrix is the decomposition of <IMG
height=16 alt=$A$
src="Bootstrapping a Principal Component Analysis.files/img259.png" width=19
align=bottom border=0> of the form <BR>
<P></P>
<DIV align=center><!-- MATH \begin{displaymath}A=L ^\prime L\end{displaymath} --><IMG
height=28 alt="\begin{displaymath}A=L ^\prime L\end{displaymath}"
src="Bootstrapping a Principal Component Analysis.files/img260.png" width=68
border=0> </DIV><BR clear=all>
<P></P><PRE>>> chl=chol(cov(scor88));
>> round(chl'*chl)
ans =
306 127 102 106 117
127 173 85 95 99
102 85 113 112 122
106 95 112 220 156
117 99 122 156 298
>> xs=randn(88,5);
>> mean(xs)
ans =
0.0480 -0.2056 0.1209 0.1097 0.1051
>> corrcoef(xs)
ans =
1.0000 -0.1623 -0.0664 -0.1947 -0.1042
-0.1623 1.0000 0.1412 0.0361 0.0600
-0.0664 0.1412 1.0000 -0.2334 0.0056
-0.1947 0.0361 -0.2334 1.0000 -0.0734
-0.1042 0.0600 0.0056 -0.0734 1.0000
>> ys=xs*chl;
>> cov(ys)
ans =
225.8649 65.7374 57.5737 29.1913 40.6952
65.7374 149.5844 74.5579 74.1853 83.3744
57.5737 74.5579 93.5467 66.5157 89.7249
29.1913 74.1853 66.5157 152.4465 88.0262
40.6952 83.3744 89.7249 88.0262 210.1678
</PRE><B>Parametric Bootstrap Simulations</B> <PRE>>> xs=randn(88,5);
>> mean(xs)
ans= 0.013 0.035 -0.151 -0.059 0.033
>> ys=xs*chl;
>> round(cov(ys))
309 169 147 130 150
169 237 137 160 146
147 137 161 158 149
130 160 158 252 172
150 146 149 172 272
function out=spca(B,orig,k);
%Parametric Bootstrap of the PCA of a dataset
%Output the proportion
%of variance for the first k axes
out=zeros(1,B);
[n,p]=size(orig);
chl=chol(cov(orig));
for (b =(1:B))
rands=randn(n,p);
newsample=rands*chl;
out(b)=pca(newsample,88,5,k);
end
r100k=spca(100000,scor88,2);
hist(r100k,200);
</PRE>
<P>
<HR>
<!--Navigation Panel--><A
href="http://www-stat.stanford.edu/~susan/courses/s208/node19.html"
name=tex2html397><IMG height=24 alt=next
src="Bootstrapping a Principal Component Analysis.files/next.png" width=37
align=bottom border=0></A> <A
href="http://www-stat.stanford.edu/~susan/courses/s208/node6.html"
name=tex2html395><IMG height=24 alt=up
src="Bootstrapping a Principal Component Analysis.files/up.png" width=26
align=bottom border=0></A> <A
href="http://www-stat.stanford.edu/~susan/courses/s208/node17.html"
name=tex2html389><IMG height=24 alt=previous
src="Bootstrapping a Principal Component Analysis.files/prev.png" width=63
align=bottom border=0></A> <BR><B>Next:</B> <A
href="http://www-stat.stanford.edu/~susan/courses/s208/node19.html"
name=tex2html398>Confidence Intervals</A> <B>Up:</B> <A
href="http://www-stat.stanford.edu/~susan/courses/s208/node6.html"
name=tex2html396>Lectures</A> <B>Previous:</B> <A
href="http://www-stat.stanford.edu/~susan/courses/s208/node17.html"
name=tex2html390>Cross Validation</A> <!--End of Navigation Panel-->
<ADDRESS>Susan Holmes 2004-05-19 </ADDRESS></BODY></HTML>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?