📄 bds1.m

📁 Neural Network in Finance (神经网络在金融界:赢得预言性的优势)全部原码。内容包括预测与估计
💻 M
📖 第 1 页 / 共 2 页
字号:
12 下一页
function [w, sig, c, c1, k] = bds(series, maxdim, distance, flag, maxram)

%BDS Brock, Dechert & Scheinkman test for independence based on the correlation dimension
%
% [W, SIG, C, C1, K] = BDS (SERIES, MAXDIM, DISTANCE, METHOD, MAXRAM)
%
% uses       - time-series vector SERIES (1),
%            - dimensional distance
%              * either defined as fraction DISTANCE of the standard deviation of SERIES
%                if FLAG = 0,
%              * or defined such that the one dimensional correlation integral of SERIES
%                is equal to DISTANCE if FLAG = 1 (2),
%            - not more than MAXRAM megabytes of memory for the computation (3),
%
% to compute - BDS statistics W for each dimension between 2 and MAXDIM (4),
%            - significance levels SIG at which the null hypotheses of no dependence are
%              rejected ASYMPTOTICALLY (use companion function BDSSIG.M for finite
%              samples) against (almost) any type of linear and non-linear dependence (5), 
%            - correlation integral estimates C for each dimension M between 2 and MAXDIM,
%            - first-order correlation integral estimates C1 computed over the last N-M+1
%              observations, and
%            - parameter estimate K (6).
%
% (1) SERIES is normally a vector of residuals obtained from a regression, but it can also
%     be any other stationary time series.
% (2) The default settings are DISTANCE = 1.5 and FLAG = 0. The BDS statistic appears to
%     be most efficient estimated if the measure of dimensional distance EPSILON is chosen
%     such that the first-order correlation integral estimate (C1) lies around 0.7 (see
%     Kanzler, 1998, forthcoming). For settings DISTANCE = 0.7 and FLAG = 1, the
%     programme will chose EPSILON accordingly. Unfortunately, the cost of finding optimal
%     EPSILON is quite high in terms of CPU time and required memory. For a near-normal
%     distribution, the default settings achieve the same without any extra computational
%     burden.
% (3) The default setting is MAXRAM = 150, which is recommended for a system with 192MB
%     physical RAM installed. The programme is highly optimised as to maximise speed given
%     available memory, so it is very important to specify MAXRAM correctly as the amount
%     of physical memory available AFTER starting MATLAB, loading any data and running
%     other applications concurrently. The smaller the amount of RAM available to the
%     programme (in relation to the length of SERIES), the slower the algorithm chosen
%     from six alternatives. However, if MAXRAM is chosen too large, MATLAB will make use
%     of virtual (hard-disk) memory, and this will slow down computation considerably.
% (4) The default setting for MAXDIM is 2. For MAXDIM = 1, W and SIG are empty.
% (5) A vector of NaN is returned if the MATLAB Statistics Toolbox is not installed.
% (6) The BDS statistic W(M) is a function of C(M), C1(1), C1(M) and K, and these
%     estimates are normally of no further interest.
%
% See Kanzler (1998) for some explanation of the main parts of the algorithm (other
% explanations are commented into the below code), for a detailed investigation of the
% finite-sample properties of the BDS statistic, for tables of small-sample quantiles and
% for a comparison with software by Dechert (1988) and LeBaron (1988, 1990, 1997a, 1997b).
% These and other important references can be found at the end of the script.
%
% * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
% * All rights reserved. This script may be redistributed if it is left unaltered in    *
% * its entirety (619 lines, 31422 bytes) and if nothing is charged for redistribution. *
% * Usage of the programme in applications and alterations of the code should be        *
% * referenced properly. See http://users.ox.ac.uk/~econlrk for updated versions.       *
% * The author appreciates suggestions for improvement or other feedback.               *
% * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
%
%                    Copyright (c) 14 April 1998  by Ludwig Kanzler
%                    Department of Economics,  University of Oxford
%                    Postal: Christ Church, Oxford OX1 1DP, England
%                    E-mail:  ludwig.kanzler@economics.oxford.ac.uk
%                    $ Revision: 2.41 $ $ Date: 15 September 1998 $

% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% % % % % % % % % % Executable part of main function BDS.M starts here  % % % % % % % % %
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %

%%%%%%%%%%%%%%%%%%%%%% Check and transformation of input arguments %%%%%%%%%%%%%%%%%%%%%%

if nargin < 5
   maxram = 150;
elseif maxram > 500
   disp('Are you sure you have so much memory available?')
   error('If so, you need to edit the code, otherwise try again with a lower value.')
end

if nargin < 4
   flag = 0;
elseif ~any(flag == [0 1])
   error('Unknown method for determining dimensional distance; try again with 0 or 1.')
end

if nargin < 3
   distance = 1.5;
elseif distance < 0
   error('The dimensional distance parameter must be positive.')
elseif flag == 1 & distance > 1
   error('The correlation integral cannot exceed 1.')
end

if nargin < 2
   maxdim = 2;
elseif maxdim < 1
   error('The dimension needs to be at least 1.');
end

if nargin < 1
   error('Cannot compute the BDS statistic on nothing.')
end

[rows,cols] = size(series);
if rows > 1 & cols == 1
   n = rows;
   series = series';
elseif cols > 1 & rows == 1
   n = cols;
elseif cols > 1 & rows > 1
   n = cols*rows;
   series = series(:)'; % transformation into a row vector
   disp(sprintf('\aTransformed matrix input into a single column.'))
else
   error('Cannot compute the BDS statistic on a scalar!')
end

%%%%%%%%%%%% Determination of and preparations for fastest method given MAXRAM %%%%%%%%%%%

fastbuild     = 0.000016 * (1:52) .* pow2(1:52); % memory requirements
slowbuild     = 0.000045           * pow2(1:52); % for the various
holdinfo      = 0.000005           * pow2(1:52); % algorithms in 
wordtable     = 0.000008 *   n^2  ./     (1:52); % megabytes for 
bitandop      = 0.000024 *   n^2  ./     (1:52); % given N

[ram1, bits1] = min(fastbuild + holdinfo + wordtable + bitandop); % number of bits for
[ram2, bits2] = min(fastbuild + holdinfo + wordtable);            % which each of six
[ram3, bits3] = min(slowbuild + holdinfo + wordtable + bitandop); % methods uses minimum
[ram4, bits4] = min(slowbuild + holdinfo + wordtable);            % memory; this memory
[ram5, bits5] = min(                       wordtable + bitandop); % is given by
[ram6, bits6] = min(                       wordtable);            % ram1, ram2,..., ram6

if ram1 < maxram | ram2 < maxram
   if ram1 < maxram
      method = 1;
      bits = bits1; ram = ram1;
   else
      method = 2;                                     % maximum number of rows to put
      bits = bits2; ram = ram2;                       % through BITAND and bit-counting
      stepping = floor((maxram-ram)*bits/n/0.000024); % algorithm without exceeding MAXRAM
   end

   % Vector BITINFO lists the number of bits set for each integer between 0 and 2^bits
   % (corresponding to the indices of the vector shifted by 1). See Kanzler (1998) for an
   % explanation.
   bitinfo = uint8(sum(rem(floor((0:pow2(bits)-1)'*pow2(1-bits:0)),2),2));

elseif ram3 < maxram | ram4 < maxram
   if ram3 < maxram
      method = 3;
      bits = bits3; ram = ram3;
   else
      method = 4;
      bits = bits4; ram = ram4;
      stepping = floor((maxram - ram) * bits / n / 0.000024);
   end

   bitinfo(1:pow2(bits), :) = uint8(0);         % the same as above, but created through
   for bit = 1 : bits                           % a loop, which consumes less memory
      bitinfo(1:pow2(bits)) = sum([bitinfo, ...
         kron(ones(pow2(bits-bit),1), [zeros(pow2(bit-1),1); ones(pow2(bit-1),1)])],2);
   end

elseif ram5 < maxram | ram6 < maxram
   if ram5 < maxram
      method = 5;
      bits = bits5; ram = ram5;
   else
      method = 6;
      bits = bits6; ram = ram6;
      stepping = floor((maxram - ram) * bits / n / 0.000024);
   end

else
   disp('Insufficient amount of memory. Allocate more memory to the system')
   disp('or reduce the number of observations, then try again.')
   error(' ')
end

%%%%%%%%%%%%%%%%%%%%% Determination of dimensional distance EPSILON %%%%%%%%%%%%%%%%%%%%%%

% The empirical investigation by Kanzler (1998) shows that choosing EPSILON such that the
% first-order correlation integral is around 0.7 yields the most efficient estimation of
% low-dimensional BDS statistics. Hence the objective here is to choose EPSILON such that,
% say, 70% of all observations lie within distance EPSILON to each other. If desired, the
% programme first determines EPSILON as to fulfil this or a similar requirement.
%
% The conceptually simplest way of setting up the calculation of distance among all
% observations is to define a two-dimensional table D (for "distance") of length and width
% N and assign to each co-ordinate (x,y) the result of the problem ABS(x-y).
%
% In principle, the entire table could thus be created with the following one-line
% statement:
%             D = ABS( SERIES(ONES(1000,1),:)' - SERIES(ONES(1000,1),:) )
%
% Since the lower triangle of the table only replicates the upper triangle and since the
% diagonal values represent own values (ones) which are not desired to be included in the
% calculation, only the upper triangle receives further attention.
%
% Unfortunately, sewing all the row vectors of the upper triangle together to form one
% single (row) vector makes indexing very messy. To aid understanding of the vector-space
% indexing used here (as well as in the optional sub-function further below), one may wish
% to refer to the following exemplary matrix table (N=7):
%
%                                       Using this example, it is easy to verify
%          * * * *c o l u m n* * * *    that column vector I is defined by the
%     I    1   2   3   4   5   6   7    following indices in vector space:
%                                          I+(0 : I-2)*N - CUMSUM(1 : I-1)
%  *  1    *   1   2   3   4   5   6
%  *  2    .   *   7   8   9  10  11    More generally, column vector I starting only
%  r  3    .   .   *  12  13  14  15    in row J is:
%  o  4    .   .   .   *  16  17  18       I+(J-1 : I-2)*N - SUM(1:J-1)-CUMSUM(J : I-1)
%  w  5    .   .   .   .   *  19  20
%  *  6    .   .   .   .   .   *  21    Row vector I is given by indices:
%  *  7    .   .   .   .   .   .   *       1+(I-1)*(N-1)-SUM(1:I-2) : I*(N-1)-SUM(1:I-1)
%
% (A formal derivation of the above formulae is beyond the scope of this script.)
%
% To calculate a percentile of the distribution of distance values, the row vector is
% sorted (unfortunately, this requires a lot of time and RAM in MATLAB).

if ~flag
   demeaned = series-sum(series)/n;                        % fastest algorithm for
   epsilon  = distance * sqrt(demeaned*demeaned'/(n-1));   % computing the standard
   clear demeaned % to save memory                         % deviation of SERIES
   
elseif 0.000008 * 3 * sum(1:n-1) < maxram % check memory requirements for DIST and sorting
   dist(1:sum(1:n-1)) = 0;
   for i = 1 : n-1
      dist(1+(i-1)*(n-1)-sum(0:i-2):i*(n-1)-sum(1:i-1)) = abs(series(i+1:n)-series(i));
   end
   sorted  = sort(dist);
   epsilon = sorted(round(distance*sum(1:n-1))); % DISTANCEth percentile of SORTED series
   clear dist sorted
else
   error('Insufficient RAM to compute EPSILON; allocate more memory or use METHOD = 1.')
end

%%%%%%%%%%%% Computation and storage of one-dimensional distance information %%%%%%%%%%%%

% Similarly to the above, a two-dimensional table C (for "close") of length and width N
% can be defined by assigning to each co-ordinate (x,y) the result of the problem
% ABS(x-y) <= EPSILON; (x,y) assumes the value 1 if the statement is true and 0 otherwise.
%
% Formally, for given EPSILON:
%                                 C(x,y) = 1 if ABS(x-y) <= EPSILON
%                                        = 0 otherwise
%
% Once again, the resulting information needs to be stored in the most efficient way.
% In this implementation, this is done by chopping each row of the table into "words" of
% several bits, the precise number of bits per word being determined by the above
% algorithms. One "word" is thus represented by one integer. This slashes the size of the
% table by the number of bits. See Kanzler (1998) for more details.
%
% The below routine stores all rows of the upper triangle of the conceptual table
% (described in Kanzler, 1998) left-aligned and assigns zeros to all other elements.
%
% As will also be explained further below, the computation of parameter K requires the sum
% of each FULL row, i.e. each row including the elements in the lower triangle and on the
% diagonal. The "missing" bits correspond to the sums over each column in the upper
% triangle, and these sums are also computed and stored in the below loop. And to make
% matters simple, diagonal values are allocated to the column sums by initialising them
% with value 1. See also Kanzler (1998).

colsum(1:n)              = 1;
rowsum(1:n)              = 0;
nwords                   = ceil((n-1)/bits);
wrdmtrx(1:n-1,1:nwords)  = 0;                 % initialisation of bit-word table

for row = 1 : n-1
   bitvec                = abs(series(1+row:n) - series(row)) <= epsilon;
   rowsum(row)           = sum(bitvec);
   colsum(1+row:n)       = colsum(1+row:n) + bitvec;
   nwords                = ceil((n-row)/bits);
   wrdmtrx(row,1:nwords) = (reshape([bitvec,zeros(1,nwords*bits-n+row)],...%transformation
                                        bits, nwords)' *pow2(0:bits-1)')'; %into bit-words
end
clear series bitvec

%%%%%%%%%%%%%%%%%% Computation of one-dimensional correlation estimates %%%%%%%%%%%%%%%%%%

% C1(1), the fraction (or estimated probability) of pairs in SERIES being "close" in the
% first dimension is just the average over ALL unique elements. C1(1) is hence the most
% efficient estimator of C(1), and the resulting estimate is used in the computation of
% SIGMA(M) further below.
%
% However, for the difference term C(M) - C(1)^M of the BDS statistic (see further below)
% to follow SIGMA asymptotically, both C(M) and C(1) need to be estimated over the same
% length vector, and so MAXDIM different C1's need to be estimated here:
%
%                               N     N
%    C1(M) = 2/(N-M+1)/(N-M) * SUM   SUM  B(S,T)
%                              S=M  T=S+1
%
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -