📄 stringkerneltest2.m
字号:
%This file demonstrates the use of a simple string kernel in characterising the similarity of protein sequences%Arthur Gretton%28/07/03clearclose%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%The parameter below controls the length of the subsequences being compared.%Change it to see how this affects performance in protein classification, for%both the spectrum and mismatch kernels.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%seqLength = 4; %sequence length for contiguous substring%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Load the data%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%totalStringNum = 60+44;proteinStrings = cell(totalStringNum,1);labels = [ones(60,1);-ones(44,1)];stringIndex = 1;class1_data = textread('protein_1.txt','%s','delimiter','\n');class1_data (end+1) = cellstr('>');class2_data = textread('protein_2.txt','%s','delimiter','\n');class2_data (end+1) = cellstr('>');l=1;while l <= length(class1_data) newline = char(class1_data(l)); if isempty(newline) l=l+1; elseif newline(end)=='}' | newline(1)=='>' l=l+1; else newClass1String=[]; while newline(1)~='>' newClass1String = [newClass1String newline]; l=l+1; newline = char(class1_data(l)); if isempty(newline) newline='>'; end end proteinStrings(stringIndex)=cellstr(newClass1String); stringIndex=stringIndex+1; endendl=1;while l <= length(class2_data) newline = char(class2_data(l)); if isempty(newline) l=l+1; elseif newline(end)=='}' | newline(1)=='>' l=l+1; else newClass2String=[]; while newline(1)~='>' newClass2String = [newClass2String newline]; l=l+1; newline = char(class2_data(l)); if isempty(newline) newline='>'; end end proteinStrings(stringIndex)=cellstr(newClass2String); stringIndex=stringIndex+1; endend%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Compute the Gram matrix for the simple kernel and mismatch kernel%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%[K]=simpleStringKernel(seqLength,proteinStrings);[K_mismatch]=mismatchStringKernel(seqLength,proteinStrings);%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Compute the cross validation error for string kernel%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%The following clears the memory of the large sequenceFeatures matrix.sequenceFeatures=labels;d=data(sequenceFeatures,labels);k=kernel('custom',K);a=cv(svm(k));a.train_on_fold=1; a.folds=5;r=train( a , d) ; get_mean( r)d=subplot(2,2,1)stem( group2vec(loss(r)))xlabel('Cross validation index');ylabel('Error');title('Basic string kernel')d=subplot(2,2,2)imagesc(K);title('Basic string kernel matrix')%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Compute the cross validation error for mismatch kernel%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%d=data(sequenceFeatures,labels);k_mismatch=kernel('custom',K_mismatch);a=cv(svm(k_mismatch));a.train_on_fold=1; a.folds=5;r_mismatch=train( a , d) ; get_mean( r_mismatch)d=subplot(2,2,3)stem( group2vec(loss(r_mismatch)))xlabel('Cross validation index');ylabel('Error');title('Mismatch kernel')d=subplot(2,2,4)imagesc( K_mismatch)title('Mismatch kernel matrix')%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%printing commands%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%set(d,'fontsize',16)%set(get(d,'Parent'),'PaperPosition',[0 0 9.6 4.8])%set(get(d,'Parent'),'PaperPosition',[0 0 6 6])%print -depsc2 mismatchStringKernel.eps
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -