📄 bnbestfit.m

📁 Speaker Verification Toolbox
💻 M
📖 第 1 页 / 共 2 页
字号:
12 下一页
function [Fhat,Ehat,Et] = bnBestFit(X,Y,w,k,F,bi,Xt,Yt,wt)

% [Fhat,Ehat,Et] = bnBestFit(X,Y,w,k,F,bi,Xt,Yt,wt) - Best-Fit inference
%
% bnBestFit performs a regulatory network inference for a set of variables 
% (genes) under the Boolean network model. The function returns the
% Best-Fit function and the corresponding (non-normalized) error-size for
% all input variable combinations in X and for all the target variables in
% Y. That is, rows in X correspond to predictor variables, and rows in Y
% correspond to  target variables. The i:th variable at the j:th sample (or
% time point), Y(i,j), is predicted based on the value of the predictor 
% variables at the same sample X(:,j). Note that if unity weights (defined 
% in w) are used for all the samples then the found functions are equal to 
% the ones which minimize the resubstitution error on the sample data (i.e,
% histogram rule for discrete data). In such a case, the corresponding 
% error-sizes in Ehat are the (non-normalized) minimum resubstitution
% errors, i.e., the number of errors/misclassifications. In case of tie, a
% random selected function with minimum error-size is returned. If
% additional test data sets (Xt, Yt, and wt) are provided then the
% error-size of the found Best-Fit functions on the test data is computed
% as well. (This can be useful in  the case of different cross-validation
% and bootstrap experiments.)
%
% INPUT:
% X     - Binary input matrix. X(i,:) corresponds to the (binary) values of
%         the i:th predictor variable. Correspondingly, X(:,j) represents 
%         the values of the predictor variables for the j:th sample (or the
%         j:th time point).
% Y     - Binary output matrix. Y(i,:) corresponds to the (binary) values 
%         of i:th target variable. Correspondingly, Y(:,j) represents the 
%         values of the target variables for the j:th sample. In other 
%         words, Y(i,j) is the value of the i:th target variable in the 
%         j:th sample, i.e., the bit that is to be predicted based on
%         X(:,j) (this holds for all i and j).
% w     - Weight vector containing positive weights for the measurements in
%         X and Y. For now, the implementation only allows to define a
%         single weight for each column in X (and Y). In particular, the
%         weight w(i) defines the weight for the i:th input vector (and the
%         corresponding output).
% k     - The (maximum) number of variables in the predictor functions,
%         i.e., indegree.
% F     - The set of Boolean predictors to be used in the inference. If F 
%         is the empty matrix, then the function class is considered to 
%         contain all k-variable Boolean functions. F is either a
%         (2^k)-by-nf binary matrix or 1-by-nf row vector of integers (see
%         also the description of the next argument bs), where k is the 
%         number of variables in each function and nf is the number of 
%         functions.
%         1.    The case of (2^k)-by-nf binary matrix: Let f = F(:,j) be
%               the j:th column of F (i.e., the j:th truth table in F. 
%               Then, f(0) defines the output value for the input vector
%               00...00, f(1) for the input 00...01, f(2) for the input 
%               00...10, ..., and f(2^k) for the input 11...11. Input 
%               vectors are interpreted such that the left most bit defines
%               the value of the first input variable, the second bit from 
%               the left defines the value of the second input variable, 
%               ..., and the right most bit defines the value of the last 
%               (k:th) input variable.
%         2.    The case of 1-by-nf row vector of integers: Let f = F(j) be
%               the j:th element of F. Then, the first bit (as obtained by 
%               the bitget command bitget(f,1)) defines the output value 
%               for the input vector 00...00, the second bit bitget(f,2) 
%               defines the output for the input 00...01, ..., and the 
%               2^k:th bit bitget(f,2^k) defines the output for the input
%               11...11. Thus, the regular truth table presentation of the
%               j:th function can be obtained by f = bitget(F(j),[1:2^k])'.
%               Note that only the cases k<=5 can be handled by this
%               convention.
% bi    - A bit (0/1) indicating that whether the Boolean functions in F 
%         are represented in the form of standard binary truth tables (0) 
%         or (encoded) integers (1). (This can be used distinguish between 
%         constant functions and the integer presentations).
% Xt    - [Optional] Input data for a separate test data. Format is the
%         same as for the matrix X (see above).
% Yt    - [Optional] Output data for a separate test data. Format is the
%         same as for the matrix Y (see above).
% wt    - [Optional] Weights for a separate test data. Format is the same
%         as for the matrix w (see above).
%
% OUTPUT:
% Fhat  - A 3-D binary matrix of the Best-Fit functions for each input 
%         variable combinations and for all target variables. Fhat has size
%         (2^k)-by-nchoosek(n,k)-by-ni, where n is the size of the first
%         dimension of matrix X (i.e., the number of predictor variables) 
%         and ni is the number of target variables. Fhat(:,:,i) defines the
%         Best-Fit functions for the i:th node. In particular, Fhat(:,j,i) 
%         defines the Best-Fit function for the i:th variable and for the 
%         j:th variable combination (the j:th variables combination 
%         corresponds to the variables on the j:th row of the matrix
%         nchoosek([1:n],k);). Each column in Fhat(:,:,i) is interpreted as
%         the columns in the binary matrix F (see above the case 1).
% Ehat  - The error-size of the Best-Fit function for all input variable
%         combinations and for all the target variables. Ehat has size
%         nchoosek(n,k)-by-ni. Thus, Ehat(i,j) is the error-size of the
%         Best-Fit function for i:th input variable combination and for the
%         j:th target node.
% Et    - This variable is returned only if Xt, Yt and wt are present in
%         the input. The error-size of the Best-Fit function for all input
%         variable combinations and for all the target variables on the 
%         separate test data.

% 03.04.2003 by Harri L鋒desm鋕i, modified from bnBestFit.
% Modified: May 14, 2003 by HL.
%           25/08/2005 by HL.


%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% Define and initialize some variables.
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

[n,m] = size(X); % The number of predictor genes and the number of measurements.
ni = size(Y,1); % The number of (target) genes.

b = 2.^[k-1:-1:0]'; % Powers of two (used in binary-to-decimal convertions).

W = ones(ni,1)*w; % Weights in a matrix form (assume w is a row vector).

kk = 2^k; % Two to the power of k (needed often).

combnum = nchoosek(n,k); % The number of different variable combinations.

% Generate all variable combinations in advance. This will work only for
% moderately small data sets. If one wants to use larger data sets, then
% the input variable combinations can be generated using the function
% nextnchoosek.m (e.g. given an input variable combination, I =
% nextnchoosek(I,n); generates the next variable combination in
% lexicographial order.
if combnum>20000 % Limit the number of possible combinations.
    error('Too many variable combinations. Modify the code a little bit...')
end % if combnum>20000
IAll = nchoosek([1:n],k);

% Modify the variables below if only a subset of all combinations are to be
% checked (e.g. in the case of parallelizing the code...)
starti = 1;
stopi = combnum;

% Initialize the output matrix/matrices.
Fhat = zeros(kk,combnum,ni);
Ehat = zeros(combnum,ni);

% Check that whether additional test data is available.
TestBit = 0;
if nargin==9
    Et = zeros(combnum,ni);
    Wt = ones(ni,1)*wt; % Wt = repmat(w,ni,1);, weights in vector form.
    TestBit = 1;
end % if nargout > 1


%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% The main loop separately for unconstrained and constrained case.
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

% If F is an empty matrix, then the function class is considered to contain
% all k-variables Boolean functions (unconstrained case).
if isempty(F)
    
    % Two times two to the power of k (needed often).
    kkk = 2*kk;
    
    % This matrix (C01) has the role of c^(0) and c^(1) for all interesting
    % genes. Further, C01 = [c^(0),c^(1)];
    C01 = zeros(ni,kkk);
    %sC01 = size(C01);
        
    %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    % Run through all variable combinations.
    %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    for i=starti:stopi
        
        % The current variable combinations (in lexicographical ordering).
        I = IAll(i,:);
        
        % Initialize again.
        C01 = zeros(ni,kkk);
        
        %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        % Run through all measurements.
        %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        % This loop also takes into account possible multiplisities in
        % measurements, i.e., computes new weights for those measurements
        % that appear several times in T (and/or F).
        for j=1:m
            
            % The current input as a decimal number to be used to index the
            % matrix C01.
            dn = X(I,j)'*b + 1;
            %dn = sum(bitset(0,t1(logical(D(I,IO(1,j)))))) + 1;
            %dn = binarr2dec(D(I,IO(1,j))',b) + 1;
                        
            % Update C01 (c^(0) and c^(1)). First update left half (C0) and
            % then right half (C1)
            C01(logical(1-Y(:,j)),dn) = C01(logical(1-Y(:,j)),dn) + w(j);
            C01(logical(Y(:,j)),kk+dn) = C01(logical(Y(:,j)),kk+dn) + w(j);
            
        end % for j=1:m
        
        % Find the Best-Fit function for all the nodes.
        [OptErr,OptF] = min(cat(3,C01(:,1:kk),C01(:,kk+1:end)),[],3);
        OptF = 2 - OptF';
        
        % All output bits having tie are set uniformly randomly. This also
        % takes care of the undefined bits due to the initialization of
        % matrix C01.
        Ties = (C01(:,1:kk)==C01(:,kk+1:end))';
        OptF(Ties) = (rand(1,sum(Ties(:))))>0.5;
        
        % Store the Best-Fit functions.
        Fhat(:,i,:) = OptF;
        % Store the corresponding (weighted) error-size.
        Ehat(i,:) = sum(OptErr,2)';
        
        
        if TestBit % If the test data is provided
            %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
            % Apply the Best-Fit functions to the test data.
            %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
            % All the inputs on the test data as decimal number.
            dn = Xt(I,:)'*b + 1;
            
            % Output values of the current functions for all the inputs.
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -