📄 statdemo.m

📁 偏最小二乘算法在MATLAB中的实现
💻 M
字号:
echo off, clc
%STATDEMO Elementary stats, t test, F test and AVOVA

%  Copyright
%  Barry M. Wise
%  1994
load statdata
echo on

% This script demonstrates the elementary statistics
% functions in the PLS_Toolbox including the t test,
% F test and one and two way analysis of variance routines.
% The development in the script follows Chapter 2 of 
% "Chemometrics" by Sharaf, Illman and Kowalski, John Wiley
% & Sons, 1986.

% First lets consider the t test. A t test can be used to
% compare the means of two samples. For instance, suppose
% that we have tested the percent yield from a given
% chemical reaction using two different catalysts
% and would like to determine if the difference in yield
% is significant. A table of our data is below.
pause
echo off
disp('  ')
disp('Percent yield with A and B')
disp('    A      B')
disp(a1dat(:,1:2))
disp('   ')
disp('Mean yield with A and B')
disp(mean(a1dat(:,1:2)));
disp('  ')
disp('Yield variance with A and B')
disp(std(a1dat(:,1:2)).^2);
echo on
% For the moment, lets assume that we know that the 
% populations for each of the catalysts should have
% equal variances, even though it is apparent that
% our sample variances are not equal.
pause

% The t test statistic is defined as
%
%         [(x1bar-x2bar)-(mu1-mu2)][n1+n2-2]^0.5 
% t = -----------------------------------------------
%  v  [1/n1 + 1/n2]^0.5 [(n1-1)s1^2 + (n2-1)s2^2]^0.5
%
% where
% n1, n2     = size of first and second samples
% mu1, mu2   = true mean of first and second populations
% s1^2, s2^2 = variance of first and second samples
% v          = n1 + n2 - 2 is the degress of freedom
%
pause

% The quantity
%
%     (n1-1)s1^2 + (n2-1)s2^2
%     -----------------------
%           n1 + n2 - 2
% 
% is the pooled variance and is the best unbiased estimator
% of the population variance.

% If we assume that mu1 = mu2 then the t statistic can
% be calculated as follows
pause
echo off

t = abs(((mean(a1dat(:,1))-mean(a1dat(:,2)))*sqrt(14))/...
    (sqrt(1/4)*sqrt(7*std(a1dat(:,1)).^2 + 7*std...
	(a1dat(:,2)).^2)))
echo on
	
% We can now compare this value of t to the value from
% a t table calculated by the TTESTP function. The first
% input to the function is the probability point corresponding
% to our desired confidence level. If we want 95% confidence,
% we would choose .05 since 100(1-.05) = 95%. The second
% input is the degrees of freedom in the problem, which in
% this case is 14 (8 + 8 - 2). The final input is a flag
% which tells the function that we wish to input a probability
% point and have it return the value of the t statistic 
% instead of the other way around (which we will use shortly).

tt = ttestp(.05,14,2)
pause

% As you can see, the value of t is greater than the value
% expected from the t test function, telling us that the 
% means are significantly different at the 95% level. Note
% however, that if we had chosen 99% confidence levels, 
% the TTESTP function would yield

tt = ttestp(.01,14,2)

% and the difference between catalysts would not be significant
% at the 99% level. It is also possible to calculate
% the exact confidence level at which the test fails using
% the inverse test as follows:
pause

tt = ttestp(t,14,1)

% Thus, we see that the hypothesis of equal means fails
% at the 
echo off
100*(1-tt)
echo on
% percent confidence level. Note that we input the t value
% calculated above with the correct number of degrees of
% freedom (14) and set the flag to 1 so that the function
% would know that we intend to input the t value and would
% like to have the probablility point.
pause

% Earlier in this example, it was assumed that the population
% variances were equal even though the samples variances
% were clearly unequal. It is easy to check the assumption
% of equal population variances using an F test. To do this,
% we simply calculate the ratio of the variances of
% the two samples, using the greater variance as the
% numerator

F = std(a1dat(:,1))^2/std(a1dat(:,2))^2

pause
% This value can be compared to the value from an F-table
% to determine its significance. In our case, we have 
% 7 degrees of freedom in each sample, using our F test
% function, FTEST, we obtain

ff = ftest(.05,7,7) 

% and we can see that F calculated above is much less
% than the value for 95% confidence, thus, the assumption
% that the populations have equal variances is valid.
pause

% It is also possible to use the t-test when observations
% are paired. For example, imagine a series of observations
% were made on soil samples before and after treatment
% to remove some noxious metal, say Hg. Suppose the
% following data results
echo off
disp('  ')
disp('Concentration of Cr Before and After Treatment')
disp('   Before    After      Difference')
disp([tdat2 tdat2(:,1)-tdat2(:,2)])
echo on
pause
% In this case, the important variable is the difference
% between the observations before and after treatment.
% Thus, we must test to see if the difference is
% significantly different from zero to see if the
% treatment is effective. The relevant t statistic can
% be calculated from
%
%       dbar - mu
%  t  = --------- sqrt(n)
%   v      sd
%
% where dbar is the mean difference between the samples,
% mu is the true mean difference, n is the sumber of
% samples, v = n-1 is the degrees of freedom and sd is
% the standard deviation of the differences.
pause

% Applying this we obtain
echo off
t = mean(tdat2(:,1)-tdat2(:,2))*sqrt(10)/...
std(tdat2(:,1)-tdat2(:,2))
echo on
% Checking the t from the tables we get

tt = ttestp(.05,9,2)

% and we can see that the treatment does have a significant
% effect at the 95% confidence level. We can also use
% the inverse t test to determine the level at which 
% the significance test would fail as follows
pause

tt = ttestp(t,9,1)

% And we see that the it would fail at the

100*(1-tt)

% percent confidence level.
pause

% Now suppose that we have added an additional treatment
% to our first data set, e.g. we are now looking at 
% three catalysts instead of two. The percent yields
% for catalysts A, B and C are now
pause
echo off
disp('  ')
disp('Percent yield with Catalysts A, B and C')
disp('    A     B     C')
disp(a1dat)
disp('   ')
disp('Mean yield with A, B and C')
disp(mean(a1dat));
disp('  ')
disp('Yield variance with A, B and C')
disp(std(a1dat).^2);
echo on
pause

% In order to see if the treatments are having an effect,
% we can use analysis of variance or ANOVA. In this case,
% we are only looking for the effect of one factor, so
% we will perform one-way ANOVA with the ANOVA1W function
% as shown below. We need only input the data and the 
% desired confidence level.
pause

anova1w(a1dat,.95)

% From this we can see that the effect of the factor is
% significant at the 95% confidence level, that is, we
% are 95% certain the catalysts are having a significant
% effect on the yield.
pause

% Now suppose we have a situation where there are two 
% factors which could affect the outcome of the experiment.
% Typically, we would call the first one a factor and 
% arrange our data into "blocks" where the second
% factor is constant. Consider the following data where
% the concentration of Cr is measured at different
% soil depths and different distances from a hazardous
% waste site:
pause
echo off
disp('  ')
disp('    Concentration of Cr near waste disposal site')
disp('  ')
disp('                     Distance from Site (km)')
disp('             -------------------------------------')
disp('   Depth (m)    1         2         3         4 ')
disp([[0 0.5 1]' a2dat1])
echo on

pause
% Are both the distance and depth significant in determining
% the concentration of Cr? We can find out using a two way
% analysis of variance with AVOVA2W as follows by simply
% inputing the data matrix and the desired confidence level.  
pause

anova2w(a2dat1,.95)

pause
% From this we can see that the effect of the factors (the
% distance from the site) and the blocks (the depth) are
% significant at the 95% level.

% Finally, suppose that we would like to determine if there
% is a significant difference between the concentration of
% some analyte in some samples and simultaneously determine
% if there is a difference among methods used to analyze 
% them. We start with the following data:
pause

echo off
disp('   ')
disp('               Measured concentration of Analyte')
disp('                         Sample Number')
disp('             -------------------------------------')
disp('    Method      1         2         3         4         5         6')
disp([[1:5]' a2dat2])
echo on
pause

% Once again, we can use ANOVA2W to determine if the effects
% of the Samples (factors) and method (blocks) are significant
pause

anova2w(a2dat2,.95)

% And once again, we see that at the 95% level both of the
% factors are significant.

% For more information on TTESTP, FTEST, ANOVA1W and ANOVA2W,
% please see their respective help files.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -