📄 plsdemo.m
字号:
echo on
%PLSDEMO Demonstrates PLS and PCR functions
%
% This demonstration illustrates the use of the PLS and
% PCR functions in the PLS_Toolbox.
echo off
% Copyright
% Barry M. Wise
% 1992
% Modified June 1994
echo on
% The data we are going to work with is from a Liquid-Fed
% Ceramic Melter (LFCM). We will develop a correlation between
% temperatures in the molten glass tank and the tank level.
% Lets start by loading and plotting the data. Hit a key when
% you are ready.
pause
echo off
load plsdata
subplot(2,1,1)
plot(xblock1);
title('X-block Data (Predictor Variables) for PLS Demo');
xlabel('Sample Number');
ylabel('Temperature (C)');
subplot(2,1,2)
plot(yblock1)
title('Y-Block Data (Predicted Variable) for PLS Demo');
xlabel('Sample Number');
ylabel('Level (Inches)');
echo on
% You can probably already see that there is a very regualar
% variation in the temperature data and that it appears to
% correlate with the level data. This is because there is
% steep temperature gradient in the molten glass, and when
% the level changes, glasses of different temperatures pass
% by the location of the thermocouples.
pause
% Lets use the fact that temperature correlates with
% level to build PLS and PCR models that uses temperature
% to predict level. We will start by mean-centering the data.
% Here mean-centering makes sense because all of the variables
% are of the same type, and we have reason to expect that
% the temperatures with the most variance will also be the
% most predictive for level.
pause
[mxblock1,mx] = mncn(xblock1);
[myblock1,my] = mncn(yblock1);
% Now that the data is scaled we can use the PLS and PCR routines
% to make a calibration. Lets start by using all the data to
% make models and see how variance they capture. We'll also
% make a model using MLR and compare it to the PLS and PCR models.
pause
[p,q,w,t,u,b,ssqdif] = pls(mxblock1,myblock1,10);
[t,p,b] = pcr1(mxblock1,myblock1,10);
mlrmod = mxblock1\myblock1;
pause
% Take a close look at the variance captured by the PLS and PCR
% models. Notice that for any particular number of LVs or PCs
% that the PLS model always captures just a bit more Y-Block
% (predicted variable) variance while the PCR model always
% captures just a bit more X-Block (predictor variable)
% variance. This is because the principal components
% decomposition of the X-Block in PCR captures the maximum amount
% of variation that can be explained with linear factors without
% regard to how well they correlate with the Y-Block (in this
% case they do correlate quite well). PLS, on the other hand,
% tries to capture more Y-Block variance as well as describing
% X-Block variance. Thus, PLS always gets more Y-Block variance
% and less X-Block variance than PCR.
pause
% We can also see from the variance captured by the PLS and
% PCR models that 1 latent variable or principal component
% is pretty good and anything after 4 doesn't really add much.
% However, we really need to cross validate to determine the
% optimum number of latent variables and principal components.
% For this we will use the PLSCVBLK and PCRCVBLK functions.
% The reason for using these routines is that the data is
% serially correlated, so we should really split it into
% contiguous blocks. This method decreases the correlation
% between any serially correlated noise in the training and
% test sets.
pause
% Before we use PLSCVBLK and PCRCVBLK we must decide how many
% times to rebuild and test the model. I'll choose 5 since
% it is reasonable to expect that any disturbance in this
% system would have died away after 60 samples. The maximum
% number of LVs and PCs is set to 10 since it doesn't look
% like any more than that would be of any use. As the functions
% run you will see the PRESS plots for each time the model is
% rebuilt and tested. After all the trials the function finds
% the number of LVs or PCs for minimum PRESS. (Note that the
% user prompt to override the chosen number of LVs or PCs
% has been turned off for this demo.) The function then
% calculates the regression vector with the optimum number
% of LVs or PCs.
pause
subplot(1,1,1)
[plsss,cplsss,mlv,bpls] = plscvblk(mxblock1,myblock1,5,10,1);
drawnow
[pcrss,cpcrss,mpc,bpcr] = pcrcvblk(mxblock1,myblock1,5,10,1);
% You may have noticed that for PLS it was determined that
% 5 LVs was optimal, while for PCR 7 PCs was optimal. This
% is typical of PLS relative to PCR. Because PLS finds factors
% that correlate with the predicted variable, it generally
% goes through the mimimum in prediction error before PCR.
pause
% It is also interesting to look at how the Cumulative PRESS
% plots compare with each other. Note the PCR model appears
% to have a better PRESS value at the minimum than the PLS
% model.
echo off
plot(1:10,cplsss,'-y',1:10,cplsss,'+y'), hold on
plot(1:10,cpcrss,'-g',1:10,cpcrss,'og'), hold off
title('Comparison of PRESS for PLS (+) and PCR (o) Models')
xlabel('Number of Latent Variables or Principal Components')
ylabel('Model Prediction Error - PRESS')
% By plotting the regression vector we can see what variables
% were important in predicting the level. We can also compare
% this to the MLR model.
pause
echo off
plot(1:20,bpls,'-y',1:20,bpls,'+y',[1 20],[0 0],'-r'), hold on
plot(1:20,bpcr,'-g',1:20,bpcr,'og')
plot(1:20,mlrmod,'-c',1:20,mlrmod,'*c'), hold off
title('PLS (+), PCR (o) and MLR (*) Regression Vector Coefficients For Level Prediction');
xlabel('Variable Number');
ylabel('Coefficent');
pause
echo on
% Notice how the MLR model is more "spikey". This "ringing"
% in the coefficients is typical of models identified with
% MLR when there is a great deal of correlation structure in
% the data, as we have here.
% Now lets use the regression vectors to calculate the fitted
% level to the training (calibration) data and compare it to
% the actual level.
pause
ypls = mxblock1*bpls;
ypcr = mxblock1*bpcr;
ymlr = mxblock1*mlrmod;
sypls = rescale(ypls,my);
sypcr = rescale(ypcr,my);
symlr = rescale(ymlr,my);
echo off
s = 1:300;
plot(s,sypls,'-y',s,sypls,'+y'), hold on
plot(s,sypcr,'-g',s,sypcr,'og')
plot(s,symlr,'-c',s,symlr,'*c')
plot(s,yblock1,'-r',s,yblock1,'xr'), hold off
title('Actual (x) and Fitted Level by PLS (+), PCR (o) and MLR')
xlabel('Sample Number');
ylabel('Level (Inches)');
pause
echo on
% This looks pretty good, but lets try the models with a new data
% set to see how they will work for that. We start by scaling the
% new data using the same factors we used to scale the original
% data.
pause
sxblock2 = scale(xblock2,mx);
syblock2 = scale(yblock2,my);
% Now we just multiply the new xblock by the regression vectors
% to get the new prediction. After rescaling we can compare the
% predicted and actual data.
pause
newypls = sxblock2*bpls;
newypcr = sxblock2*bpcr;
newymlr = sxblock2*mlrmod;
sypls = rescale(newypls,my);
sypcr = rescale(newypcr,my);
symlr = rescale(newymlr,my);
echo off
s = 1:200;
plot(s,sypls,'-y',s,sypls,'+y'), hold on
plot(s,sypcr,'-g',s,sypcr,'og')
plot(s,symlr,'-c',s,symlr,'*c')
plot(s,yblock2,'-r',s,yblock2,'xr'), hold off
title('Actual (x) and Predicted Level by PLS (+), PCR (o) and MLR')
xlabel('Sample Number');
ylabel('Level (Inches)');
pause
echo on
% We can also calculate the total sum of squared prediction error
% for the PLS, PCR and MLR models as follows:
echo off
plsssq = sum((yblock2-sypls).^2);
pcrssq = sum((yblock2-sypcr).^2);
mlrssq = sum((yblock2-symlr).^2);
disp(' PLS error PCR error MLR error'),
disp([plsssq pcrssq mlrssq])
echo on
% So here we see that the PLS and PCR models are slightly
% better than the MLR model, as expected.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -