📄 p_spectrum.m
字号:
function [result, K] = p_spectrum(s,t,p)%P_SPECTRUM% -Finds the contiguous subsequence match count between strings s and t% by using a dynamic programming implementation,% where the length of the subsequence is p.% *(There is also a brute force implementation of this algorithm.% Type help p_spectrum_bf for info.)%% -Simply prompting the function will return the value K(s,t), however% using the function as [result,K] = K(s,t) will also return the matrix K.%% -The following algorithm is used:% K[p](sa,t) = K[p](s,t) + [Summation of i from 1 to |t|] G[p-1](s,t(1:i-1)) [t(i) == a]% K[p](s,t) = 0 if |s| < p or |t| < p% G[p](sa, tb) = G[p-1](s,t)[a==b]% G[0](s,t) = 1 for all s,t% G[p](s,t) = 0 if |s| == 0 or |t| == 0% %% -Example: p_spectrum('abccc','abc', 3) returns a value of 1.% (Note that p_spectrum('abccc','abc',3)=p_spectrum('abc','abccc',3) since K(s,t,p) = K(t,s,p) ).% -Example: p_spectrum('a','a', 1) returns a value of 1.% -Example: p_spectrum('a','b', 1) returns a value of 0.% -Example: p_spectrum('ab','ab', 2) returns a value of 1.% %%%USAGE: scalar = p_spectrum('string1','string2', p); (where p is the length of the substring)%% [scalar, matrix] = p_spectrum('string1,'string2', p);%%For more information, visit http://www.kernel-methods.net/%Written and tested in Matlab 6.0, Release 12.%Copyright 2003, Manju M. Pai 4/2003%manju@kernel-methods.net%------------------------------------------------------------------------------------------%Obtain lengths of strings[num_rows_s, n] = size(s);[num_rows_t, m] = size(t);%Initially set every matrix index to -1 to show value has not yet been foundK = repmat(-1, [n, m]); %The main kernelG = repmat(-1, [n, m, p]); %The suffix kernel%Error checking statements: %Make sure input vectors are horizontal. if (num_rows_s ~= 1 | num_rows_t ~= 1) error('Error: s and t must be horizontal vectors.'); end; %If p is less than zero or not a number, program should quit due to faulty variable input. if p <= 0 | ischar(p) error('Error: p needs to be a number greater than 0.'); end; %End of error checking%Fill in the rest of the matrix using the function p_spectrum_kernel()for i=1:n for j=1:m [K(i,j), G] = p_spectrum_kernel(s(1:i), t(1:j), K, G, p); end;end;result = K(n,m);%------------------------------------------------------------------------------------------function [ans, G] = p_spectrum_kernel(sa, t, K, G, p)%This function is called by p_spectrum(s,t,p).%Type 'help p_spectrum' for a description of the program.%%------------------------------------------------------------------------------------------%Obtain lengths of both stringsn = length(sa);m = length(t);%truncate last character of string and obtain length of new strings = sa(1:n-1);length_s = length(s);%Start algorithm: % 1) Split main algorithm into two parts: % a) K(s,t) if (length(s) < p) | (length(t) < p) %This is a base case where 0 is returned if either string has length 0 ans = 0; elseif( K( length(s), length(t) ) == -1 ) % Value has not yet been calculated ans = p_spectrum_kernel(s, t, K, G, p); else % Value has already been calculated ans = K( length(s), length(t) ); end; % b) Summation of G[p-1](s,t(1:i-1))[t(i) == a] for i = 1:(length(t) - p) %this is the letter (a) that was truncated off the string letter = sa(n); %We need this 'for' loop as a cursor that iterates through the t string. pos_array = find(t(1:(m)) == letter); %array which consists of all indices of t where t(i) == a for index = 1:length(pos_array) i = pos_array(index); length_t = length(t(1:(i-1))); if ( (p-1) == 0 ) result = 1; elseif (length_s == 0 | length_t == 0) %This is a base case where 0 is returned if either string has length 0 result = 0; elseif ( G( length_s, length_t, (p-1)) == -1 ) % Value has not yet been calculated [result, G] = suffix_kernel(s, t(1:(i-1)), G, (p-1)); else % Value has already been calculated result = G( length_s, length_t, (p-1)); end; ans = ans + result; end; return% End of algorithm%------------------------------------------------------------------------------------------function [ans, G] = suffix_kernel(sa, tb, G, p)%This function is called by p_spectrum(s,t,p).%Type 'help p_spectrum' for a description of the program.%%------------------------------------------------------------------------------------------%Obtain lengths of both stringsn = length(sa);m = length(tb);%if last characters of both strings do not match, return 0if ~(strcmpi( sa(n), tb(m) ) ) ans = 0; returnend;%truncate last character of strings = sa(1:n-1);t = tb(1:m-1);%Obtain lengths of truncated stringslength_s = length(s);length_t = length(t);%Start algorithm: G(sa,tb) = (1 + lambda^2)*G-1(s,t)[a==b]if ((p-1) == 0) %This is a base case where 1 is returned if G[p] = G[0] ans = 1;elseif (length_s == 0) | (length_t == 0) %This is a base case where 0 is returned if either string has length 0 ans = 0;elseif( G( length_s, length_t, (p-1) ) == -1 ) % Value has not yet been calculated [ans, G] = suffix_kernel(s, t, G, (p-1)); G( length_s, length_t, (p-1)) = ans;else % Value has already been calculated ans = G( length_s, length_t, (p-1) );end;return
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -