📄 overlap.m
字号:
%% Data visualization contest: Code overlap % The aim here is to try and visualize the degree of overlap between% all pairs of entries.load contest_datan = length([d.id]);t=[d.timestamp];nlines=length(allLineList);% cleanup author namescleannames = regexprep(lower({d.author}), '_|\.|&| ', '')';% author labelslbls = strcat(cleannames,textwrap({sprintf(':%.5d',[d.id])}, 6));% % contest phases% [v,ind]=sort(find([d.twilight]));% twilight=v(ind([1;end]));% [v,ind]=sort(find([d.daylight]));% daylight=v(ind([1;end]));% unique authorscn=unique(cleannames);% number of unique authorsnc=length(cn);% index of each authors entriesnl=zeros(n,1);for ii=1:nc nl(strmatch(cn{ii},cleannames,'exact'))=ii;end%% Quantify code overlap% The overlap coefficient between a pair of entries is the number of shared% lines divided by the minimum of the two entry lengths. This yields a% symmetric measure that ranges from 0 to 1. The pairwise overlap matrix is% trivial to compute but time-consuming, so I've included a low precision% pairwise overlap matrix.if ~exist('pwoverlap.mat', 'file') % pw overlap coef. is symmetric and diag is 1, so save tril ov = zeros(n*(n-1)/2, 1, 'single'); ctr=0; for ii=1:n-1 d1 = d(ii).lines; nd1=length(d1); for jj=ii+1:n ctr=ctr+1; d2 = d(jj).lines; ov(ctr) = length(intersect(d1,d2)) / min(d1, length(d2)); end end ov(isnan(ov))=0; save ('pwoverlap.mat','ov')else load pwoverlap.mat % included file is a low res version saved as a 2 byte char if isequal(class(ov),'char') % scale back to float ov = single(ov/100); endendovmat=squareform(ov);%% A Pairwise overlap heatmap% A heatmap of the pairwise overlap coefficients arranged by timestamps% reveals the extent to which the entries were similar. There is minimal% overlap during the darkness and twilight phases. During daylight the% overlap is usually restricted to a single day. The entries of 1000% character challenge [5/14 to 5/15] are quite distinct from the rest.figureimagesc(squareform(ov));% set axis ticks for 12 noon of each daydv = datevec(t(1));ds = cell(8,1); dind = zeros(8,1); day0 = dv(3)-1;for ii=1:8; dv(3) = day0+ii; dind(ii) = find(t<=datenum(dv), 1, 'last' ); ds{ii} = datestr(dv,'mm/dd');endaxis xyset(gca,'tickdir','out','xtick',dind, 'xticklabel', ds, 'ytick',dind, 'yticklabel', ds, 'fontweight', 'bold')xlabel('submission time');ylabel('submission time');title('Code overlap', 'fontsize', 12, 'fontweight', 'bold')%% Visualize a single authors' results% Display the overlap between entries for a single author.authname ='yicao';idx = find(nl==strmatch(authname, cn));nidx=length(idx);col_ord = repmat(idx,1,nidx)+repmat((idx-1)',nidx,1)*n;x=ovmat(col_ord);figureimagesc(x);title(sprintf ('Code overlap for %s',authname), 'fontsize',12, 'fontweight', 'bold')ds=datestr(t(idx(get(gca,'xtick'))),'mm/dd');axis xyset(gca, 'tickdir', 'out', 'xticklabel',ds, 'yticklabel', ds, 'fontweight','bold')xlabel('submission time')ylabel('submission time')%% Network analysis% Use the bioinformatics toolbox to graph the highly overlapping% entries. % Visualizing large networks is one area where I find Matlab lacking and need to resort to third-party tools.% Perhaps we can see better network visualiztion tools in the future?% value at which edges are considered significant ov_cutoff=0.7;if ~isempty(ver('bioinfo')) xcm=x.*(x>ov_cutoff); bg=biograph(xcm,lbls(idx),'LayoutType','radial','showarrows','off'); view(bg)else error('Requires the Bioinformatics Toolbox')end
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -