⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 seq__tokenizer_8h-source.html

📁 这是一个用于数据挖掘的常用算法的模板库(数据挖掘的C++模板库for UNIX)
💻 HTML
📖 第 1 页 / 共 2 页
字号:
<a name="l00158"></a>00158                 <span class="keyword">delete</span> p;<a name="l00159"></a>00159                 <a name="l00160"></a>00160               }<span class="comment">//end if(vat_hmap.find())</span><a name="l00161"></a>00161                 <span class="keywordflow">else</span> {<a name="l00162"></a>00162                   <span class="comment">//create a new vat &amp; insert it</span><a name="l00163"></a>00163                   svat=<span class="keyword">new</span> <a class="code" href="classvat.html">VAT</a>();<a name="l00164"></a>00164                   <span class="keyword">typename</span> VAT::INSTANCES new_tidlist;<a name="l00165"></a>00165                   new_tidlist.push_back(<a class="code" href="classseq__instance.html">INSTANCE</a>(ts,sequence_pos));<a name="l00166"></a>00166                   svat-&gt;push_back(make_pair(tid, new_tidlist));<a name="l00167"></a>00167                   <span class="keywordflow">if</span>(!vat_hmap.add_vat(p, svat)) {<a name="l00168"></a>00168                     cerr&lt;&lt;<span class="stringliteral">"tokenizer.get_length_one: add_vat failed"</span>&lt;&lt;endl;<a name="l00169"></a>00169                     <span class="keywordflow">return</span> -1;<a name="l00170"></a>00170                   }<a name="l00171"></a>00171                   freq_pats.push_back(p);<a name="l00172"></a>00172                 }<span class="comment">//end else</span><a name="l00173"></a>00173                 <a name="l00174"></a>00174           }<span class="comment">//end switch</span><a name="l00175"></a>00175           <a name="l00176"></a>00176         }<span class="comment">//end while</span><a name="l00177"></a>00177         <a name="l00178"></a>00178       }<span class="keywordflow">while</span>(<span class="keyword">true</span>);<a name="l00179"></a>00179       <a name="l00180"></a>00180       <span class="keywordflow">return</span> -1;<a name="l00181"></a>00181     }<span class="comment">//end parse_next_trans()</span><a name="l00182"></a>00182   <a name="l00183"></a>00183 <span class="keyword">private</span>:<a name="l00184"></a>00184     <span class="keywordtype">int</span> MAXLINE; <a name="l00185"></a>00185   <a class="code" href="classelement__parser.html">element_parser&lt;V_T&gt;</a> el_prsr; <a name="l00187"></a>00187 }; <span class="comment">//end class seq_tokenizer</span><a name="l00188"></a>00188 <a name="l00189"></a>00189 <span class="keyword">template</span>&lt;<span class="keyword">class</span> PP, <span class="keyword">typename</span> MP, <span class="keyword">typename</span> TP, <span class="keyword">typename</span> PAT_ST, <span class="keyword">template</span>&lt;<span class="keyword">typename</span>, <span class="keyword">typename</span>, <span class="keyword">typename</span>, <span class="keyword">template</span> &lt;<span class="keyword">typename</span>&gt; <span class="keyword">class </span>&gt; <span class="keyword">class </span>CC, <a name="l00190"></a>00190 template &lt;typename&gt; class ALLOC &gt;<a name="l00191"></a>00191 class tokenizer&lt;SEQ_PATTERN, FASTA_TKNZ_PROP, ALLOC &gt;<a name="l00192"></a>00192 {<a name="l00193"></a>00193   <a name="l00194"></a>00194 <span class="keyword">public</span>:<a name="l00195"></a>00195   <span class="keyword">typedef</span> <a class="code" href="classpattern__support.html">pattern_support&lt;V_Fkk_MINE_PROP&gt;</a> PAT_SUP;<a name="l00196"></a>00196   <span class="keyword">typedef</span> <a class="code" href="classvat.html">vat&lt;SEQ_PROP, V_Fkk_MINE_PROP, ALLOC, std::vector &gt;</a> VAT;<a name="l00197"></a>00197   <span class="keyword">typedef</span> seq_instance &lt;V_Fkk_MINE_PROP&gt; INSTANCE;<a name="l00198"></a>00198   <span class="keyword">typedef</span> <span class="keyword">typename</span> SEQ_PATTERN::VERTEX_T V_T;<a name="l00199"></a>00199   <span class="keyword">typedef</span> <span class="keyword">typename</span> SEQ_PATTERN::EDGE_T E_T;<a name="l00200"></a>00200   <a name="l00201"></a>00201   <a name="l00202"></a>00202   tokenizer(<span class="keywordtype">int</span> max=LINE_SZ): MAXLINE(max) {} <a name="l00210"></a>00210   <span class="keyword">template</span>&lt;<span class="keyword">class</span> SM_T&gt;<a name="l00211"></a>00211   <span class="keywordtype">int</span> parse_next_trans(ifstream&amp; infile, <a class="code" href="classpat__fam.html">pat_fam&lt;SEQ_PATTERN&gt;</a>&amp; freq_pats, <a name="l00212"></a>00212                        <a class="code" href="classstorage__manager.html">storage_manager&lt;SEQ_PATTERN, VAT, ALLOC, SM_T&gt;</a>&amp; vat_hmap ) {<a name="l00213"></a>00213     <a name="l00214"></a>00214     <span class="keywordtype">char</span>* line=<span class="keyword">new</span> <span class="keywordtype">char</span>[MAXLINE];<a name="l00215"></a>00215     <span class="keywordtype">char</span> word[MAXLINE];<a name="l00216"></a>00216     <span class="keywordtype">char</span>* startline=line;<a name="l00217"></a>00217     <a name="l00218"></a>00218     <span class="keywordtype">int</span> i=0, len, seqlen=0;<a name="l00219"></a>00219     <span class="keyword">static</span> <span class="keywordtype">int</span> tid=-1;<a name="l00220"></a>00220     <span class="keywordtype">int</span> pos; <span class="comment">//stores starting position of input stream's get pointer</span><a name="l00221"></a>00221     VAT* svat;<a name="l00222"></a>00222     <span class="keywordtype">bool</span> first = <span class="keyword">true</span>; <span class="comment">//first line of new fasta seq in the file</span><a name="l00223"></a>00223     <a name="l00224"></a>00224     <span class="keywordflow">do</span> {<a name="l00225"></a>00225       pos=infile.tellg();<a name="l00226"></a>00226       line=startline;<a name="l00227"></a>00227       *line = <span class="charliteral">'\0'</span>;<a name="l00228"></a>00228       infile.getline(line, MAXLINE-1);<a name="l00229"></a>00229       <span class="comment">//len=infile.gcount();</span><a name="l00230"></a>00230       len = strlen(line);<a name="l00231"></a>00231       <a name="l00232"></a>00232       <span class="keywordflow">if</span>(len == 0){<a name="l00233"></a>00233         <span class="keywordflow">if</span> (infile.eof()) {<a name="l00234"></a>00234           tid= -1;<a name="l00235"></a>00235           <span class="keyword">delete</span>[] startline;<a name="l00236"></a>00236           <span class="keywordflow">return</span> tid;<a name="l00237"></a>00237         }<a name="l00238"></a>00238         <span class="keywordflow">else</span> <span class="keywordflow">continue</span>; <span class="comment">//just a blank line, skip</span><a name="l00239"></a>00239       }<a name="l00240"></a>00240       <a name="l00241"></a>00241       <a name="l00242"></a>00242       <span class="keywordflow">if</span> (line[0] == <span class="charliteral">'&gt;'</span>){<a name="l00243"></a>00243         <span class="keywordflow">if</span> (first){<a name="l00244"></a>00244           tid++; <span class="comment">// increment the seq id</span><a name="l00245"></a>00245           first = <span class="keyword">false</span>;<a name="l00246"></a>00246           <span class="keywordflow">continue</span>; <span class="comment">//go onto next line</span><a name="l00247"></a>00247         }<a name="l00248"></a>00248         <span class="keywordflow">else</span>{<a name="l00249"></a>00249           infile.seekg(pos); <span class="comment">//reset the file pos to beginning of </span><a name="l00250"></a>00250                      <span class="comment">//line for next seq</span><a name="l00251"></a>00251           <span class="keyword">delete</span>[] startline;<a name="l00252"></a>00252           <span class="keywordflow">return</span> tid;<a name="l00253"></a>00253         }<a name="l00254"></a>00254       }<a name="l00255"></a>00255       <a name="l00256"></a>00256       <span class="comment">//read the fasta seq</span><a name="l00257"></a>00257       <span class="keywordflow">for</span> (i=0; i &lt; len; ++i, ++seqlen){<a name="l00258"></a>00258         <span class="comment">//read each char and insert into VAT</span><a name="l00259"></a>00259         <span class="comment">//this is an element, insert/append to its VAT</span><a name="l00260"></a>00260         SEQ_PATTERN* p = <span class="keyword">new</span> SEQ_PATTERN();<a name="l00261"></a>00261         V_T v = string(1,line[i]);<a name="l00262"></a>00262         <a name="l00263"></a>00263         <a name="l00264"></a>00264         <span class="comment">// Add vertex and update the canonical code.</span><a name="l00265"></a>00265         p-&gt;add_vertex(v);<a name="l00266"></a>00266         p-&gt;init_canonical_code(v);<a name="l00267"></a>00267         <a name="l00268"></a>00268         <span class="comment">//if p contains a vat in vat_hmap, append tid/ts to the entry</span><a name="l00269"></a>00269         <span class="comment">//else create a new vat and insert it into vat_hmap,            //and add p to freq_pats</span><a name="l00270"></a>00270         svat=vat_hmap.get_vat(p);<a name="l00271"></a>00271         <span class="comment">//if(vat_hmap.find(p))</span><a name="l00272"></a>00272         <span class="keywordflow">if</span>(svat != NULL) {<a name="l00273"></a>00273           <span class="comment">//vat found, check if this tid exists in it</span><a name="l00274"></a>00274           <a name="l00275"></a>00275           <span class="keyword">typename</span> VAT::IT vit=svat-&gt;end()-1;<a name="l00276"></a>00276           <span class="keywordflow">if</span>(vit-&gt;first!=tid)<a name="l00277"></a>00277             vit=svat-&gt;end();      <a name="l00278"></a>00278           <a name="l00279"></a>00279           <span class="keywordflow">if</span>(vit!=svat-&gt;end())<a name="l00280"></a>00280             <span class="comment">//tid found</span><a name="l00281"></a>00281             vit-&gt;second.push_back(INSTANCE(seqlen, seqlen));<a name="l00282"></a>00282           <span class="keywordflow">else</span> {<a name="l00283"></a>00283             <span class="comment">//tid not found</span><a name="l00284"></a>00284             <span class="keyword">typename</span> VAT::INSTANCES new_tidlist;<a name="l00285"></a>00285             new_tidlist.push_back(INSTANCE(seqlen, seqlen));<a name="l00286"></a>00286             svat-&gt;push_back(make_pair(tid, new_tidlist));<a name="l00287"></a>00287           }<a name="l00288"></a>00288           <a name="l00289"></a>00289           <span class="keyword">delete</span> p;<a name="l00290"></a>00290           <a name="l00291"></a>00291         }<span class="comment">//end if(vat_hmap.find())</span><a name="l00292"></a>00292         <span class="keywordflow">else</span> {<a name="l00293"></a>00293           <span class="comment">//create a new vat &amp; insert it</span><a name="l00294"></a>00294           svat=<span class="keyword">new</span> VAT();<a name="l00295"></a>00295           <span class="keyword">typename</span> VAT::INSTANCES new_tidlist;<a name="l00296"></a>00296           new_tidlist.push_back(INSTANCE(seqlen, seqlen));<a name="l00297"></a>00297           svat-&gt;push_back(make_pair(tid, new_tidlist));<a name="l00298"></a>00298           <span class="keywordflow">if</span>(!vat_hmap.add_vat(p, svat)) {<a name="l00299"></a>00299             cerr&lt;&lt;<span class="stringliteral">"tokenizer.get_length_one: add_vat failed"</span>&lt;&lt;endl;<a name="l00300"></a>00300             <span class="keywordflow">return</span> -1;<a name="l00301"></a>00301           }<a name="l00302"></a>00302           freq_pats.push_back(p);<a name="l00303"></a>00303         }<span class="comment">//end else</span><a name="l00304"></a>00304       }<a name="l00305"></a>00305       <a name="l00306"></a>00306     }<span class="keywordflow">while</span>(<span class="keyword">true</span>);<a name="l00307"></a>00307     <a name="l00308"></a>00308     <span class="keywordflow">return</span> -1;<a name="l00309"></a>00309   }<span class="comment">//end parse_next_trans()</span><a name="l00310"></a>00310   <a name="l00311"></a>00311 <span class="keyword">private</span>:<a name="l00312"></a>00312     <span class="keywordtype">int</span> MAXLINE; <a name="l00314"></a>00314 }; <span class="comment">//end class seq_tokenizer</span><a name="l00315"></a>00315 <a name="l00316"></a>00316 <span class="preprocessor">#endif</span><a name="l00317"></a>00317 <span class="preprocessor"></span></pre></div><hr size="1"><address style="align: right;"><small>Generated on Wed Jul 26 14:01:08 2006 for DMTL by&nbsp;<a href="http://www.doxygen.org/index.html"><img src="doxygen.png" alt="doxygen" align="middle" border="0"></a> 1.4.7 </small></address></body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -