📄 overlap.html
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"><html xmlns:mwsh="http://www.mathworks.com/namespace/mcode/v1/syntaxhighlight.dtd"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <!--This HTML is auto-generated from an M-file.To make changes, update the M-file and republish this document. --> <title>overlap</title> <meta name="generator" content="MATLAB 7.4"> <meta name="date" content="2009-04-08"> <meta name="m-file" content="overlap"><style>body { background-color: white; margin:10px;}h1 { color: #990000; font-size: x-large;}h2 { color: #990000; font-size: medium;}/* Make the text shrink to fit narrow windows, but not stretch too far in wide windows. */ p,h1,h2,div.content div { max-width: 600px; /* Hack for IE6 */ width: auto !important; width: 600px;}pre.codeinput { background: #EEEEEE; padding: 10px;}@media print { pre.codeinput {word-wrap:break-word; width:100%;}} span.keyword {color: #0000FF}span.comment {color: #228B22}span.string {color: #A020F0}span.untermstring {color: #B20000}span.syscmd {color: #B28C00}pre.codeoutput { color: #666666; padding: 10px;}pre.error { color: red;}p.footer { text-align: right; font-size: xx-small; font-weight: lighter; font-style: italic; color: gray;} </style></head> <body> <div class="content"> <h2>Contents</h2> <div> <ul> <li><a href="#1">Data visualization contest: Code overlap</a></li> <li><a href="#2">Quantify code overlap</a></li> <li><a href="#3">A Pairwise overlap heatmap</a></li> <li><a href="#4">Visualize a single authors' results</a></li> <li><a href="#5">Network analysis</a></li> </ul> </div> <h2>Data visualization contest: Code overlap<a name="1"></a></h2> <p>The aim here is to try and visualize the degree of overlap between all pairs of entries.</p><pre class="codeinput">load <span class="string">contest_data</span>n = length([d.id]);t=[d.timestamp];nlines=length(allLineList);<span class="comment">% cleanup author names</span>cleannames = regexprep(lower({d.author}), <span class="string">'_|\.|&| '</span>, <span class="string">''</span>)';<span class="comment">% author labels</span>lbls = strcat(cleannames,textwrap({sprintf(<span class="string">':%.5d'</span>,[d.id])}, 6));<span class="comment">% % contest phases</span><span class="comment">% [v,ind]=sort(find([d.twilight]));</span><span class="comment">% twilight=v(ind([1;end]));</span><span class="comment">% [v,ind]=sort(find([d.daylight]));</span><span class="comment">% daylight=v(ind([1;end]));</span><span class="comment">% unique authors</span>cn=unique(cleannames);<span class="comment">% number of unique authors</span>nc=length(cn);<span class="comment">% index of each authors entries</span>nl=zeros(n,1);<span class="keyword">for</span> ii=1:nc nl(strmatch(cn{ii},cleannames,<span class="string">'exact'</span>))=ii;<span class="keyword">end</span></pre><h2>Quantify code overlap<a name="2"></a></h2> <p>The overlap coefficient between a pair of entries is the number of shared lines divided by the minimum of the two entry lengths. This yields a symmetric measure that ranges from 0 to 1. The pairwise overlap matrix is trivial to compute but time-consuming, so I've included a low precision pairwise overlap matrix. </p><pre class="codeinput"><span class="keyword">if</span> ~exist(<span class="string">'pwoverlap.mat'</span>, <span class="string">'file'</span>) <span class="comment">% pw overlap coef. is symmetric and diag is 1, so save tril</span> ov = zeros(n*(n-1)/2, 1, <span class="string">'single'</span>); ctr=0; <span class="keyword">for</span> ii=1:n-1 d1 = d(ii).lines; nd1=length(d1); <span class="keyword">for</span> jj=ii+1:n ctr=ctr+1; d2 = d(jj).lines; ov(ctr) = length(intersect(d1,d2)) / min(d1, length(d2)); <span class="keyword">end</span> <span class="keyword">end</span> ov(isnan(ov))=0; save (<span class="string">'pwoverlap.mat'</span>,<span class="string">'ov'</span>)<span class="keyword">else</span> load <span class="string">pwoverlap.mat</span> <span class="comment">% included file is a low res version saved as a 2 byte char</span> <span class="keyword">if</span> isequal(class(ov),<span class="string">'char'</span>) <span class="comment">% scale back to float</span> ov = single(ov/100); <span class="keyword">end</span><span class="keyword">end</span>ovmat=squareform(ov);</pre><h2>A Pairwise overlap heatmap<a name="3"></a></h2> <p>A heatmap of the pairwise overlap coefficients arranged by timestamps reveals the extent to which the entries were similar. There is minimal overlap during the darkness and twilight phases. During daylight the overlap is usually restricted to a single day. The entries of 1000 character challenge [5/14 to 5/15] are quite distinct from the rest. </p><pre class="codeinput">figureimagesc(squareform(ov));<span class="comment">% set axis ticks for 12 noon of each day</span>dv = datevec(t(1));ds = cell(8,1);dind = zeros(8,1);day0 = dv(3)-1;<span class="keyword">for</span> ii=1:8; dv(3) = day0+ii; dind(ii) = find(t<=datenum(dv), 1, <span class="string">'last'</span> ); ds{ii} = datestr(dv,<span class="string">'mm/dd'</span>);<span class="keyword">end</span>axis <span class="string">xy</span>set(gca,<span class="string">'tickdir'</span>,<span class="string">'out'</span>,<span class="string">'xtick'</span>,dind, <span class="string">'xticklabel'</span>, ds, <span class="string">'ytick'</span>,dind, <span class="string">'yticklabel'</span>, ds, <span class="string">'fontweight'</span>, <span class="string">'bold'</span>)xlabel(<span class="string">'submission time'</span>);ylabel(<span class="string">'submission time'</span>);title(<span class="string">'Code overlap'</span>, <span class="string">'fontsize'</span>, 12, <span class="string">'fontweight'</span>, <span class="string">'bold'</span>)</pre><img vspace="5" hspace="5" src="overlap_01.png"> <h2>Visualize a single authors' results<a name="4"></a></h2> <p>Display the overlap between entries for a single author.</p><pre class="codeinput">authname =<span class="string">'yicao'</span>;idx = find(nl==strmatch(authname, cn));nidx=length(idx);col_ord = repmat(idx,1,nidx)+repmat((idx-1)',nidx,1)*n;x=ovmat(col_ord);figureimagesc(x);title(sprintf (<span class="string">'Code overlap for %s'</span>,authname), <span class="string">'fontsize'</span>,12, <span class="string">'fontweight'</span>, <span class="string">'bold'</span>)ds=datestr(t(idx(get(gca,<span class="string">'xtick'</span>))),<span class="string">'mm/dd'</span>);axis <span class="string">xy</span>set(gca, <span class="string">'tickdir'</span>, <span class="string">'out'</span>, <span class="string">'xticklabel'</span>,ds, <span class="string">'yticklabel'</span>, ds, <span class="string">'fontweight'</span>,<span class="string">'bold'</span>)xlabel(<span class="string">'submission time'</span>)ylabel(<span class="string">'submission time'</span>)</pre><img vspace="5" hspace="5" src="overlap_02.png"> <h2>Network analysis<a name="5"></a></h2> <p>Use the bioinformatics toolbox to graph the highly overlapping entries. Visualizing large networks is one area where I find Matlab lacking and need to resort to third-party tools. Perhaps we can see better network visualiztion tools in the future? </p><pre class="codeinput"><span class="comment">% value at which edges are considered significant</span>ov_cutoff=0.7;<span class="keyword">if</span> ~isempty(ver(<span class="string">'bioinfo'</span>)) xcm=x.*(x>ov_cutoff); bg=biograph(xcm,lbls(idx),<span class="string">'LayoutType'</span>,<span class="string">'radial'</span>,<span class="string">'showarrows'</span>,<span class="string">'off'</span>); view(bg)<span class="keyword">else</span> error(<span class="string">'Requires the Bioinformatics Toolbox'</span>)<span class="keyword">end</span></pre><img vspace="5" hspace="5" src="overlap_03.png"> <p class="footer"><br> Published with MATLAB® 7.4<br></p> </div> <!--##### SOURCE BEGIN #####%% Data visualization contest: Code overlap % The aim here is to try and visualize the degree of overlap between% all pairs of entries.load contest_datan = length([d.id]);t=[d.timestamp];nlines=length(allLineList);% cleanup author namescleannames = regexprep(lower({d.author}), '_|\.|&| ', '')';% author labelslbls = strcat(cleannames,textwrap({sprintf(':%.5d',[d.id])}, 6));% % contest phases% [v,ind]=sort(find([d.twilight]));% twilight=v(ind([1;end]));% [v,ind]=sort(find([d.daylight]));% daylight=v(ind([1;end]));% unique authorscn=unique(cleannames);% number of unique authorsnc=length(cn);% index of each authors entriesnl=zeros(n,1);for ii=1:nc nl(strmatch(cn{ii},cleannames,'exact'))=ii;end%% Quantify code overlap% The overlap coefficient between a pair of entries is the number of shared% lines divided by the minimum of the two entry lengths. This yields a% symmetric measure that ranges from 0 to 1. The pairwise overlap matrix is% trivial to compute but time-consuming, so I've included a low precision% pairwise overlap matrix.if ~exist('pwoverlap.mat', 'file') % pw overlap coef. is symmetric and diag is 1, so save tril ov = zeros(n*(n-1)/2, 1, 'single'); ctr=0; for ii=1:n-1 d1 = d(ii).lines; nd1=length(d1); for jj=ii+1:n ctr=ctr+1; d2 = d(jj).lines; ov(ctr) = length(intersect(d1,d2)) / min(d1, length(d2)); end end ov(isnan(ov))=0; save ('pwoverlap.mat','ov')else load pwoverlap.mat % included file is a low res version saved as a 2 byte char if isequal(class(ov),'char') % scale back to float ov = single(ov/100); endendovmat=squareform(ov);%% A Pairwise overlap heatmap% A heatmap of the pairwise overlap coefficients arranged by timestamps% reveals the extent to which the entries were similar. There is minimal% overlap during the darkness and twilight phases. During daylight the% overlap is usually restricted to a single day. The entries of 1000% character challenge [5/14 to 5/15] are quite distinct from the rest.figureimagesc(squareform(ov));% set axis ticks for 12 noon of each daydv = datevec(t(1));ds = cell(8,1); dind = zeros(8,1); day0 = dv(3)-1;for ii=1:8; dv(3) = day0+ii; dind(ii) = find(t<=datenum(dv), 1, 'last' ); ds{ii} = datestr(dv,'mm/dd');endaxis xyset(gca,'tickdir','out','xtick',dind, 'xticklabel', ds, 'ytick',dind, 'yticklabel', ds, 'fontweight', 'bold')xlabel('submission time');ylabel('submission time');title('Code overlap', 'fontsize', 12, 'fontweight', 'bold')%% Visualize a single authors' results% Display the overlap between entries for a single author.authname ='yicao';idx = find(nl==strmatch(authname, cn));nidx=length(idx);col_ord = repmat(idx,1,nidx)+repmat((idx-1)',nidx,1)*n;x=ovmat(col_ord);figureimagesc(x);title(sprintf ('Code overlap for %s',authname), 'fontsize',12, 'fontweight', 'bold')ds=datestr(t(idx(get(gca,'xtick'))),'mm/dd');axis xyset(gca, 'tickdir', 'out', 'xticklabel',ds, 'yticklabel', ds, 'fontweight','bold')xlabel('submission time')ylabel('submission time')%% Network analysis% Use the bioinformatics toolbox to graph the highly overlapping% entries. % Visualizing large networks is one area where I find Matlab lacking and need to resort to third-party tools.% Perhaps we can see better network visualiztion tools in the future?% value at which edges are considered significant ov_cutoff=0.7;if ~isempty(ver('bioinfo')) xcm=x.*(x>ov_cutoff); bg=biograph(xcm,lbls(idx),'LayoutType','radial','showarrows','off'); view(bg)else error('Requires the Bioinformatics Toolbox')end##### SOURCE END #####--> </body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -