stringtowordvector.html

来自「数据挖掘的最常用工具。由于开源」· HTML 代码 · 共 1,646 行 · 第 1/5 页

HTML
1,646
字号
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><!--NewPage--><HTML><HEAD><!-- Generated by javadoc (build 1.5.0_13) on Tue Jul 15 15:48:51 NZST 2008 --><TITLE>StringToWordVector</TITLE><META NAME="keywords" CONTENT="weka.filters.unsupervised.attribute.StringToWordVector class"><LINK REL ="stylesheet" TYPE="text/css" HREF="../../../../stylesheet.css" TITLE="Style"><SCRIPT type="text/javascript">function windowTitle(){    parent.document.title="StringToWordVector";}</SCRIPT><NOSCRIPT></NOSCRIPT></HEAD><BODY BGCOLOR="white" onload="windowTitle();"><!-- ========= START OF TOP NAVBAR ======= --><A NAME="navbar_top"><!-- --></A><A HREF="#skip-navbar_top" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_top_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY="">  <TR ALIGN="center" VALIGN="top">  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> &nbsp;<FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A>&nbsp;</TD>  <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1">    <A HREF="../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A>&nbsp;</TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="http://www.cs.waikato.ac.nz/ml/weka/" target="_blank"><FONT CLASS="NavBarFont1"><B>Weka's home</B></FONT></A>&nbsp;</TD>  </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">&nbsp;<A HREF="../../../../weka/filters/unsupervised/attribute/StringToNominal.html" title="class in weka.filters.unsupervised.attribute"><B>PREV CLASS</B></A>&nbsp;&nbsp;<A HREF="../../../../weka/filters/unsupervised/attribute/SwapValues.html" title="class in weka.filters.unsupervised.attribute"><B>NEXT CLASS</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2">  <A HREF="../../../../index.html?weka/filters/unsupervised/attribute/StringToWordVector.html" target="_top"><B>FRAMES</B></A>  &nbsp;&nbsp;<A HREF="StringToWordVector.html" target="_top"><B>NO FRAMES</B></A>  &nbsp;&nbsp;<SCRIPT type="text/javascript">  <!--  if(window==top) {    document.writeln('<A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A>');  }  //--></SCRIPT><NOSCRIPT>  <A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">  SUMMARY:&nbsp;NESTED&nbsp;|&nbsp;<A HREF="#field_summary">FIELD</A>&nbsp;|&nbsp;<A HREF="#constructor_summary">CONSTR</A>&nbsp;|&nbsp;<A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL:&nbsp;<A HREF="#field_detail">FIELD</A>&nbsp;|&nbsp;<A HREF="#constructor_detail">CONSTR</A>&nbsp;|&nbsp;<A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_top"></A><!-- ========= END OF TOP NAVBAR ========= --><HR><!-- ======== START OF CLASS DATA ======== --><H2><FONT SIZE="-1">weka.filters.unsupervised.attribute</FONT><BR>Class StringToWordVector</H2><PRE>java.lang.Object  <IMG SRC="../../../../resources/inherit.gif" ALT="extended by "><A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">weka.filters.Filter</A>      <IMG SRC="../../../../resources/inherit.gif" ALT="extended by "><B>weka.filters.unsupervised.attribute.StringToWordVector</B></PRE><DL><DT><B>All Implemented Interfaces:</B> <DD>java.io.Serializable, <A HREF="../../../../weka/core/CapabilitiesHandler.html" title="interface in weka.core">CapabilitiesHandler</A>, <A HREF="../../../../weka/core/OptionHandler.html" title="interface in weka.core">OptionHandler</A>, <A HREF="../../../../weka/core/RevisionHandler.html" title="interface in weka.core">RevisionHandler</A>, <A HREF="../../../../weka/filters/UnsupervisedFilter.html" title="interface in weka.filters">UnsupervisedFilter</A></DD></DL><HR><DL><DT><PRE>public class <B>StringToWordVector</B><DT>extends <A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A><DT>implements <A HREF="../../../../weka/filters/UnsupervisedFilter.html" title="interface in weka.filters">UnsupervisedFilter</A>, <A HREF="../../../../weka/core/OptionHandler.html" title="interface in weka.core">OptionHandler</A></DL></PRE><P><!-- globalinfo-start --> Converts String attributes into a set of attributes representing word occurrence (depending on the tokenizer) information from the text contained in the strings. The set of words (attributes) is determined by the first batch filtered (typically training data). <p/> <!-- globalinfo-end -->  <!-- options-start --> Valid options are: <p/>  <pre> -C  Output word counts rather than boolean word presence. </pre>  <pre> -R &lt;index1,index2-index4,...&gt;  Specify list of string attributes to convert to words (as weka Range).  (default: select all string attributes)</pre>  <pre> -V  Invert matching sense of column indexes.</pre>  <pre> -P &lt;attribute name prefix&gt;  Specify a prefix for the created attribute names.  (default: "")</pre>  <pre> -W &lt;number of words to keep&gt;  Specify approximate number of word fields to create.  Surplus words will be discarded..  (default: 1000)</pre>  <pre> -prune-rate &lt;rate as a percentage of dataset&gt;  Specify the rate (e.g., every 10% of the input dataset) at which to periodically prune the dictionary.  -W prunes after creating a full dictionary. You may not have enough memory for this approach.  (default: no periodic pruning)</pre>  <pre> -T  Transform the word frequencies into log(1+fij)  where fij is the frequency of word i in jth document(instance). </pre>  <pre> -I  Transform each word frequency into:  fij*log(num of Documents/num of documents containing word i)    where fij if frequency of word i in jth document(instance)</pre>  <pre> -N  Whether to 0=not normalize/1=normalize all data/2=normalize test data only  to average length of training documents (default 0=don't normalize).</pre>  <pre> -L  Convert all tokens to lowercase before adding to the dictionary.</pre>  <pre> -S  Ignore words that are in the stoplist.</pre>  <pre> -stemmer &lt;spec&gt;  The stemmering algorihtm (classname plus parameters) to use.</pre>  <pre> -M &lt;int&gt;  The minimum term frequency (default = 1).</pre>  <pre> -O  If this is set, the maximum number of words and the   minimum term frequency is not enforced on a per-class   basis but based on the documents in all the classes   (even if a class attribute is set).</pre>  <pre> -stopwords &lt;file&gt;  A file containing stopwords to override the default ones.  Using this option automatically sets the flag ('-S') to use the  stoplist if the file exists.  Format: one stopword per line, lines starting with '#'  are interpreted as comments and ignored.</pre>  <pre> -tokenizer &lt;spec&gt;  The tokenizing algorihtm (classname plus parameters) to use.  (default: weka.core.tokenizers.WordTokenizer)</pre>  <!-- options-end --><P><P><DL><DT><B>Version:</B></DT>  <DD>$Revision: 1.25 $</DD><DT><B>Author:</B></DT>  <DD>Len Trigg (len@reeltwo.com), Stuart Inglis (stuart@reeltwo.com), Gordon Paynter (gordon.paynter@ucr.edu), Asrhaf M. Kibriya (amk14@cs.waikato.ac.nz)</DD><DT><B>See Also:</B><DD><A HREF="../../../../weka/core/Stopwords.html" title="class in weka.core"><CODE>Stopwords</CODE></A>, <A HREF="../../../../serialized-form.html#weka.filters.unsupervised.attribute.StringToWordVector">Serialized Form</A></DL><HR><P><!-- =========== FIELD SUMMARY =========== --><A NAME="field_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Field Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static&nbsp;int</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#FILTER_NONE">FILTER_NONE</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;normalization: No normalization.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static&nbsp;int</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#FILTER_NORMALIZE_ALL">FILTER_NORMALIZE_ALL</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;normalization: Normalize all data.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static&nbsp;int</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#FILTER_NORMALIZE_TEST_ONLY">FILTER_NORMALIZE_TEST_ONLY</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;normalization: Normalize test data only.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>static&nbsp;<A HREF="../../../../weka/core/Tag.html" title="class in weka.core">Tag</A>[]</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#TAGS_FILTER">TAGS_FILTER</A></B></CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Specifies whether document's (instance's) word frequencies are to be normalized.</TD></TR></TABLE>&nbsp;<!-- ======== CONSTRUCTOR SUMMARY ======== --><A NAME="constructor_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Constructor Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#StringToWordVector()">StringToWordVector</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default constructor.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#StringToWordVector(int)">StringToWordVector</A></B>(int&nbsp;wordsToKeep)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Constructor that allows specification of the target number of words in the output.</TD></TR></TABLE>&nbsp;<!-- ========== METHOD SUMMARY =========== --><A NAME="method_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Method Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#attributeIndicesTipText()">attributeIndicesTipText</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#attributeNamePrefixTipText()">attributeNamePrefixTipText</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#batchFinished()">batchFinished</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Signify that this batch of input to the filter is finished.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#doNotOperateOnPerClassBasisTipText()">doNotOperateOnPerClassBasisTipText</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#getAttributeIndices()">getAttributeIndices</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Gets the current range selection.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#getAttributeNamePrefix()">getAttributeNamePrefix</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Get the attribute name prefix.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;<A HREF="../../../../weka/core/Capabilities.html" title="class in weka.core">Capabilities</A></CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#getCapabilities()">getCapabilities</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns the Capabilities of this filter.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;boolean</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#getDoNotOperateOnPerClassBasis()">getDoNotOperateOnPerClassBasis</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Get the DoNotOperateOnPerClassBasis value.</TD>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?