📄 bds1.m
字号:
function [w, sig, c, c1, k] = bds(series, maxdim, distance, flag, maxram)
%BDS Brock, Dechert & Scheinkman test for independence based on the correlation dimension
%
% [W, SIG, C, C1, K] = BDS (SERIES, MAXDIM, DISTANCE, METHOD, MAXRAM)
%
% uses - time-series vector SERIES (1),
% - dimensional distance
% * either defined as fraction DISTANCE of the standard deviation of SERIES
% if FLAG = 0,
% * or defined such that the one dimensional correlation integral of SERIES
% is equal to DISTANCE if FLAG = 1 (2),
% - not more than MAXRAM megabytes of memory for the computation (3),
%
% to compute - BDS statistics W for each dimension between 2 and MAXDIM (4),
% - significance levels SIG at which the null hypotheses of no dependence are
% rejected ASYMPTOTICALLY (use companion function BDSSIG.M for finite
% samples) against (almost) any type of linear and non-linear dependence (5),
% - correlation integral estimates C for each dimension M between 2 and MAXDIM,
% - first-order correlation integral estimates C1 computed over the last N-M+1
% observations, and
% - parameter estimate K (6).
%
% (1) SERIES is normally a vector of residuals obtained from a regression, but it can also
% be any other stationary time series.
% (2) The default settings are DISTANCE = 1.5 and FLAG = 0. The BDS statistic appears to
% be most efficient estimated if the measure of dimensional distance EPSILON is chosen
% such that the first-order correlation integral estimate (C1) lies around 0.7 (see
% Kanzler, 1998, forthcoming). For settings DISTANCE = 0.7 and FLAG = 1, the
% programme will chose EPSILON accordingly. Unfortunately, the cost of finding optimal
% EPSILON is quite high in terms of CPU time and required memory. For a near-normal
% distribution, the default settings achieve the same without any extra computational
% burden.
% (3) The default setting is MAXRAM = 150, which is recommended for a system with 192MB
% physical RAM installed. The programme is highly optimised as to maximise speed given
% available memory, so it is very important to specify MAXRAM correctly as the amount
% of physical memory available AFTER starting MATLAB, loading any data and running
% other applications concurrently. The smaller the amount of RAM available to the
% programme (in relation to the length of SERIES), the slower the algorithm chosen
% from six alternatives. However, if MAXRAM is chosen too large, MATLAB will make use
% of virtual (hard-disk) memory, and this will slow down computation considerably.
% (4) The default setting for MAXDIM is 2. For MAXDIM = 1, W and SIG are empty.
% (5) A vector of NaN is returned if the MATLAB Statistics Toolbox is not installed.
% (6) The BDS statistic W(M) is a function of C(M), C1(1), C1(M) and K, and these
% estimates are normally of no further interest.
%
% See Kanzler (1998) for some explanation of the main parts of the algorithm (other
% explanations are commented into the below code), for a detailed investigation of the
% finite-sample properties of the BDS statistic, for tables of small-sample quantiles and
% for a comparison with software by Dechert (1988) and LeBaron (1988, 1990, 1997a, 1997b).
% These and other important references can be found at the end of the script.
%
% * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
% * All rights reserved. This script may be redistributed if it is left unaltered in *
% * its entirety (619 lines, 31422 bytes) and if nothing is charged for redistribution. *
% * Usage of the programme in applications and alterations of the code should be *
% * referenced properly. See http://users.ox.ac.uk/~econlrk for updated versions. *
% * The author appreciates suggestions for improvement or other feedback. *
% * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
%
% Copyright (c) 14 April 1998 by Ludwig Kanzler
% Department of Economics, University of Oxford
% Postal: Christ Church, Oxford OX1 1DP, England
% E-mail: ludwig.kanzler@economics.oxford.ac.uk
% $ Revision: 2.41 $ $ Date: 15 September 1998 $
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% % % % % % % % % % Executable part of main function BDS.M starts here % % % % % % % % %
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
%%%%%%%%%%%%%%%%%%%%%% Check and transformation of input arguments %%%%%%%%%%%%%%%%%%%%%%
if nargin < 5
maxram = 150;
elseif maxram > 500
disp('Are you sure you have so much memory available?')
error('If so, you need to edit the code, otherwise try again with a lower value.')
end
if nargin < 4
flag = 0;
elseif ~any(flag == [0 1])
error('Unknown method for determining dimensional distance; try again with 0 or 1.')
end
if nargin < 3
distance = 1.5;
elseif distance < 0
error('The dimensional distance parameter must be positive.')
elseif flag == 1 & distance > 1
error('The correlation integral cannot exceed 1.')
end
if nargin < 2
maxdim = 2;
elseif maxdim < 1
error('The dimension needs to be at least 1.');
end
if nargin < 1
error('Cannot compute the BDS statistic on nothing.')
end
[rows,cols] = size(series);
if rows > 1 & cols == 1
n = rows;
series = series';
elseif cols > 1 & rows == 1
n = cols;
elseif cols > 1 & rows > 1
n = cols*rows;
series = series(:)'; % transformation into a row vector
disp(sprintf('\aTransformed matrix input into a single column.'))
else
error('Cannot compute the BDS statistic on a scalar!')
end
%%%%%%%%%%%% Determination of and preparations for fastest method given MAXRAM %%%%%%%%%%%
fastbuild = 0.000016 * (1:52) .* pow2(1:52); % memory requirements
slowbuild = 0.000045 * pow2(1:52); % for the various
holdinfo = 0.000005 * pow2(1:52); % algorithms in
wordtable = 0.000008 * n^2 ./ (1:52); % megabytes for
bitandop = 0.000024 * n^2 ./ (1:52); % given N
[ram1, bits1] = min(fastbuild + holdinfo + wordtable + bitandop); % number of bits for
[ram2, bits2] = min(fastbuild + holdinfo + wordtable); % which each of six
[ram3, bits3] = min(slowbuild + holdinfo + wordtable + bitandop); % methods uses minimum
[ram4, bits4] = min(slowbuild + holdinfo + wordtable); % memory; this memory
[ram5, bits5] = min( wordtable + bitandop); % is given by
[ram6, bits6] = min( wordtable); % ram1, ram2,..., ram6
if ram1 < maxram | ram2 < maxram
if ram1 < maxram
method = 1;
bits = bits1; ram = ram1;
else
method = 2; % maximum number of rows to put
bits = bits2; ram = ram2; % through BITAND and bit-counting
stepping = floor((maxram-ram)*bits/n/0.000024); % algorithm without exceeding MAXRAM
end
% Vector BITINFO lists the number of bits set for each integer between 0 and 2^bits
% (corresponding to the indices of the vector shifted by 1). See Kanzler (1998) for an
% explanation.
bitinfo = uint8(sum(rem(floor((0:pow2(bits)-1)'*pow2(1-bits:0)),2),2));
elseif ram3 < maxram | ram4 < maxram
if ram3 < maxram
method = 3;
bits = bits3; ram = ram3;
else
method = 4;
bits = bits4; ram = ram4;
stepping = floor((maxram - ram) * bits / n / 0.000024);
end
bitinfo(1:pow2(bits), :) = uint8(0); % the same as above, but created through
for bit = 1 : bits % a loop, which consumes less memory
bitinfo(1:pow2(bits)) = sum([bitinfo, ...
kron(ones(pow2(bits-bit),1), [zeros(pow2(bit-1),1); ones(pow2(bit-1),1)])],2);
end
elseif ram5 < maxram | ram6 < maxram
if ram5 < maxram
method = 5;
bits = bits5; ram = ram5;
else
method = 6;
bits = bits6; ram = ram6;
stepping = floor((maxram - ram) * bits / n / 0.000024);
end
else
disp('Insufficient amount of memory. Allocate more memory to the system')
disp('or reduce the number of observations, then try again.')
error(' ')
end
%%%%%%%%%%%%%%%%%%%%% Determination of dimensional distance EPSILON %%%%%%%%%%%%%%%%%%%%%%
% The empirical investigation by Kanzler (1998) shows that choosing EPSILON such that the
% first-order correlation integral is around 0.7 yields the most efficient estimation of
% low-dimensional BDS statistics. Hence the objective here is to choose EPSILON such that,
% say, 70% of all observations lie within distance EPSILON to each other. If desired, the
% programme first determines EPSILON as to fulfil this or a similar requirement.
%
% The conceptually simplest way of setting up the calculation of distance among all
% observations is to define a two-dimensional table D (for "distance") of length and width
% N and assign to each co-ordinate (x,y) the result of the problem ABS(x-y).
%
% In principle, the entire table could thus be created with the following one-line
% statement:
% D = ABS( SERIES(ONES(1000,1),:)' - SERIES(ONES(1000,1),:) )
%
% Since the lower triangle of the table only replicates the upper triangle and since the
% diagonal values represent own values (ones) which are not desired to be included in the
% calculation, only the upper triangle receives further attention.
%
% Unfortunately, sewing all the row vectors of the upper triangle together to form one
% single (row) vector makes indexing very messy. To aid understanding of the vector-space
% indexing used here (as well as in the optional sub-function further below), one may wish
% to refer to the following exemplary matrix table (N=7):
%
% Using this example, it is easy to verify
% * * * *c o l u m n* * * * that column vector I is defined by the
% I 1 2 3 4 5 6 7 following indices in vector space:
% I+(0 : I-2)*N - CUMSUM(1 : I-1)
% * 1 * 1 2 3 4 5 6
% * 2 . * 7 8 9 10 11 More generally, column vector I starting only
% r 3 . . * 12 13 14 15 in row J is:
% o 4 . . . * 16 17 18 I+(J-1 : I-2)*N - SUM(1:J-1)-CUMSUM(J : I-1)
% w 5 . . . . * 19 20
% * 6 . . . . . * 21 Row vector I is given by indices:
% * 7 . . . . . . * 1+(I-1)*(N-1)-SUM(1:I-2) : I*(N-1)-SUM(1:I-1)
%
% (A formal derivation of the above formulae is beyond the scope of this script.)
%
% To calculate a percentile of the distribution of distance values, the row vector is
% sorted (unfortunately, this requires a lot of time and RAM in MATLAB).
if ~flag
demeaned = series-sum(series)/n; % fastest algorithm for
epsilon = distance * sqrt(demeaned*demeaned'/(n-1)); % computing the standard
clear demeaned % to save memory % deviation of SERIES
elseif 0.000008 * 3 * sum(1:n-1) < maxram % check memory requirements for DIST and sorting
dist(1:sum(1:n-1)) = 0;
for i = 1 : n-1
dist(1+(i-1)*(n-1)-sum(0:i-2):i*(n-1)-sum(1:i-1)) = abs(series(i+1:n)-series(i));
end
sorted = sort(dist);
epsilon = sorted(round(distance*sum(1:n-1))); % DISTANCEth percentile of SORTED series
clear dist sorted
else
error('Insufficient RAM to compute EPSILON; allocate more memory or use METHOD = 1.')
end
%%%%%%%%%%%% Computation and storage of one-dimensional distance information %%%%%%%%%%%%
% Similarly to the above, a two-dimensional table C (for "close") of length and width N
% can be defined by assigning to each co-ordinate (x,y) the result of the problem
% ABS(x-y) <= EPSILON; (x,y) assumes the value 1 if the statement is true and 0 otherwise.
%
% Formally, for given EPSILON:
% C(x,y) = 1 if ABS(x-y) <= EPSILON
% = 0 otherwise
%
% Once again, the resulting information needs to be stored in the most efficient way.
% In this implementation, this is done by chopping each row of the table into "words" of
% several bits, the precise number of bits per word being determined by the above
% algorithms. One "word" is thus represented by one integer. This slashes the size of the
% table by the number of bits. See Kanzler (1998) for more details.
%
% The below routine stores all rows of the upper triangle of the conceptual table
% (described in Kanzler, 1998) left-aligned and assigns zeros to all other elements.
%
% As will also be explained further below, the computation of parameter K requires the sum
% of each FULL row, i.e. each row including the elements in the lower triangle and on the
% diagonal. The "missing" bits correspond to the sums over each column in the upper
% triangle, and these sums are also computed and stored in the below loop. And to make
% matters simple, diagonal values are allocated to the column sums by initialising them
% with value 1. See also Kanzler (1998).
colsum(1:n) = 1;
rowsum(1:n) = 0;
nwords = ceil((n-1)/bits);
wrdmtrx(1:n-1,1:nwords) = 0; % initialisation of bit-word table
for row = 1 : n-1
bitvec = abs(series(1+row:n) - series(row)) <= epsilon;
rowsum(row) = sum(bitvec);
colsum(1+row:n) = colsum(1+row:n) + bitvec;
nwords = ceil((n-row)/bits);
wrdmtrx(row,1:nwords) = (reshape([bitvec,zeros(1,nwords*bits-n+row)],...%transformation
bits, nwords)' *pow2(0:bits-1)')'; %into bit-words
end
clear series bitvec
%%%%%%%%%%%%%%%%%% Computation of one-dimensional correlation estimates %%%%%%%%%%%%%%%%%%
% C1(1), the fraction (or estimated probability) of pairs in SERIES being "close" in the
% first dimension is just the average over ALL unique elements. C1(1) is hence the most
% efficient estimator of C(1), and the resulting estimate is used in the computation of
% SIGMA(M) further below.
%
% However, for the difference term C(M) - C(1)^M of the BDS statistic (see further below)
% to follow SIGMA asymptotically, both C(M) and C(1) need to be estimated over the same
% length vector, and so MAXDIM different C1's need to be estimated here:
%
% N N
% C1(M) = 2/(N-M+1)/(N-M) * SUM SUM B(S,T)
% S=M T=S+1
%
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -