⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 stringkerneltest2.m

📁 Proteins _ String Kernel
💻 M
字号:
%This file demonstrates the use of a simple string kernel in characterising the similarity of protein sequences%Arthur Gretton%28/07/03clearclose%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%The parameter below controls the length of the subsequences being compared.%Change it to see how this affects performance in protein classification, for%both the spectrum and mismatch kernels.%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%seqLength = 4;   %sequence length for contiguous substring%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Load the data%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%totalStringNum = 60+44;proteinStrings = cell(totalStringNum,1);labels = [ones(60,1);-ones(44,1)];stringIndex = 1;class1_data = textread('protein_1.txt','%s','delimiter','\n');class1_data (end+1) = cellstr('>');class2_data = textread('protein_2.txt','%s','delimiter','\n');class2_data (end+1) = cellstr('>');l=1;while  l <= length(class1_data)  newline = char(class1_data(l));  if isempty(newline)    l=l+1;  elseif newline(end)=='}' | newline(1)=='>'     l=l+1;  else     newClass1String=[];     while newline(1)~='>'       newClass1String = [newClass1String newline];       l=l+1;       newline = char(class1_data(l));       if isempty(newline)         newline='>';       end     end     proteinStrings(stringIndex)=cellstr(newClass1String);     stringIndex=stringIndex+1;  endendl=1;while  l <= length(class2_data)  newline = char(class2_data(l));  if isempty(newline)    l=l+1;  elseif newline(end)=='}' | newline(1)=='>'     l=l+1;  else     newClass2String=[];     while newline(1)~='>'       newClass2String = [newClass2String newline];       l=l+1;       newline = char(class2_data(l));       if isempty(newline)         newline='>';       end     end     proteinStrings(stringIndex)=cellstr(newClass2String);     stringIndex=stringIndex+1;  endend%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Compute the Gram matrix for the simple kernel and mismatch kernel%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%[K]=simpleStringKernel(seqLength,proteinStrings);[K_mismatch]=mismatchStringKernel(seqLength,proteinStrings);%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Compute the cross validation error for string kernel%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%The following clears the memory of the large sequenceFeatures matrix.sequenceFeatures=labels;d=data(sequenceFeatures,labels);k=kernel('custom',K);a=cv(svm(k));a.train_on_fold=1; a.folds=5;r=train( a , d) ; get_mean( r)d=subplot(2,2,1)stem( group2vec(loss(r)))xlabel('Cross validation index');ylabel('Error');title('Basic string kernel')d=subplot(2,2,2)imagesc(K);title('Basic string kernel matrix')%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Compute the cross validation error for mismatch kernel%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%d=data(sequenceFeatures,labels);k_mismatch=kernel('custom',K_mismatch);a=cv(svm(k_mismatch));a.train_on_fold=1; a.folds=5;r_mismatch=train( a , d) ; get_mean( r_mismatch)d=subplot(2,2,3)stem( group2vec(loss(r_mismatch)))xlabel('Cross validation index');ylabel('Error');title('Mismatch kernel')d=subplot(2,2,4)imagesc( K_mismatch)title('Mismatch kernel matrix')%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%printing commands%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%set(d,'fontsize',16)%set(get(d,'Parent'),'PaperPosition',[0 0 9.6 4.8])%set(get(d,'Parent'),'PaperPosition',[0 0 6 6])%print -depsc2 mismatchStringKernel.eps

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -