📄 stringtowordvector.html

📁 weka是机器学习和数据挖掘领域最有影响力的开源项目之一
💻 HTML
📖 第 1 页 / 共 5 页
字号:
<A NAME="FILTER_NORMALIZE_ALL"><!-- --></A><H3>FILTER_NORMALIZE_ALL</H3><PRE>public static final int <B>FILTER_NORMALIZE_ALL</B></PRE><DL><DD>normalization: Normalize all data<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#weka.filters.unsupervised.attribute.StringToWordVector.FILTER_NORMALIZE_ALL">Constant Field Values</A></DL></DL><HR><A NAME="FILTER_NORMALIZE_TEST_ONLY"><!-- --></A><H3>FILTER_NORMALIZE_TEST_ONLY</H3><PRE>public static final int <B>FILTER_NORMALIZE_TEST_ONLY</B></PRE><DL><DD>normalization: Normalize test data only<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#weka.filters.unsupervised.attribute.StringToWordVector.FILTER_NORMALIZE_TEST_ONLY">Constant Field Values</A></DL></DL><HR><A NAME="TAGS_FILTER"><!-- --></A><H3>TAGS_FILTER</H3><PRE>public static final <A HREF="../../../../weka/core/Tag.html" title="class in weka.core">Tag</A>[] <B>TAGS_FILTER</B></PRE><DL><DD>Specifies whether document's (instance's) word frequencies are to be normalized.  The are normalized to average length of documents specified as input format.<P><DL></DL></DL><!-- ========= CONSTRUCTOR DETAIL ======== --><A NAME="constructor_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Constructor Detail</B></FONT></TH></TR></TABLE><A NAME="StringToWordVector()"><!-- --></A><H3>StringToWordVector</H3><PRE>public <B>StringToWordVector</B>()</PRE><DL><DD>Default constructor. Targets 1000 words in the output.<P></DL><HR><A NAME="StringToWordVector(int)"><!-- --></A><H3>StringToWordVector</H3><PRE>public <B>StringToWordVector</B>(int&nbsp;wordsToKeep)</PRE><DL><DD>Constructor that allows specification of the target number of words in the output.<P><DL><DT><B>Parameters:</B><DD><CODE>wordsToKeep</CODE> - the number of words in the output vector (per class if assigned).</DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="listOptions()"><!-- --></A><H3>listOptions</H3><PRE>public java.util.Enumeration <B>listOptions</B>()</PRE><DL><DD>Returns an enumeration describing the available options<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../weka/core/OptionHandler.html#listOptions()">listOptions</A></CODE> in interface <CODE><A HREF="../../../../weka/core/OptionHandler.html" title="interface in weka.core">OptionHandler</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>an enumeration of all the available options</DL></DD></DL><HR><A NAME="setOptions(java.lang.String[])"><!-- --></A><H3>setOptions</H3><PRE>public void <B>setOptions</B>(java.lang.String[]&nbsp;options)                throws java.lang.Exception</PRE><DL><DD>Parses a given list of options. <p/>    <!-- options-start --> Valid options are: <p/>  <pre> -C  Output word counts rather than boolean word presence. </pre>  <pre> -D &lt;delimiter set&gt;  String containing the set of delimiter characters  (default: " \n\t.,:'\"()?!")</pre>  <pre> -R &lt;index1,index2-index4,...&gt;  Specify list of string attributes to convert to words (as weka Range).  (default: select all string attributes)</pre>  <pre> -P &lt;attribute name prefix&gt;  Specify a prefix for the created attribute names.  (default: "")</pre>  <pre> -W &lt;number of words to keep&gt;  Specify approximate number of word fields to create.  Surplus words will be discarded..  (default: 1000)</pre>  <pre> -T  Transform the word frequencies into log(1+fij)  where fij is the frequency of word i in jth document(instance). </pre>  <pre> -I  Transform each word frequency into:  fij*log(num of Documents/num of  documents containing word i)    where fij if frequency of word i in  jth document(instance)</pre>  <pre> -N  Whether to 0=not normalize/1=normalize all data/2=normalize test data only  to average length of training documents (default 0=don't normalize).</pre>  <pre> -A  Only form tokens from contiguous alphabetic sequences  (The delimiter string is ignored if this is set).</pre>  <pre> -L  Convert all tokens to lowercase before adding to the dictionary.</pre>  <pre> -S  Ignore words that are in the stoplist.</pre>  <pre> -stemmer &lt;spec&gt;  The stemmering algorihtm (classname plus parameters) to use.</pre>  <pre> -M &lt;int&gt;  The minimum term frequency (default = 1).</pre>  <pre> -O  If this is set, the maximum number of words and the   minimum term frequency is not enforced on a per-class   basis but based on the documents in all the classes   (even if a class attribute is set).</pre>    <!-- options-end --><P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../weka/core/OptionHandler.html#setOptions(java.lang.String[])">setOptions</A></CODE> in interface <CODE><A HREF="../../../../weka/core/OptionHandler.html" title="interface in weka.core">OptionHandler</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>options</CODE> - the list of options as an array of strings<DT><B>Throws:</B><DD><CODE>java.lang.Exception</CODE> - if an option is not supported</DL></DD></DL><HR><A NAME="getOptions()"><!-- --></A><H3>getOptions</H3><PRE>public java.lang.String[] <B>getOptions</B>()</PRE><DL><DD>Gets the current settings of the filter.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../weka/core/OptionHandler.html#getOptions()">getOptions</A></CODE> in interface <CODE><A HREF="../../../../weka/core/OptionHandler.html" title="interface in weka.core">OptionHandler</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>an array of strings suitable for passing to setOptions</DL></DD></DL><HR><A NAME="getCapabilities()"><!-- --></A><H3>getCapabilities</H3><PRE>public <A HREF="../../../../weka/core/Capabilities.html" title="class in weka.core">Capabilities</A> <B>getCapabilities</B>()</PRE><DL><DD>Returns the Capabilities of this filter.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../weka/core/CapabilitiesHandler.html#getCapabilities()">getCapabilities</A></CODE> in interface <CODE><A HREF="../../../../weka/core/CapabilitiesHandler.html" title="interface in weka.core">CapabilitiesHandler</A></CODE><DT><B>Overrides:</B><DD><CODE><A HREF="../../../../weka/filters/Filter.html#getCapabilities()">getCapabilities</A></CODE> in class <CODE><A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>the capabilities of this object<DT><B>See Also:</B><DD><A HREF="../../../../weka/core/Capabilities.html" title="class in weka.core"><CODE>Capabilities</CODE></A></DL></DD></DL><HR><A NAME="setInputFormat(weka.core.Instances)"><!-- --></A><H3>setInputFormat</H3><PRE>public boolean <B>setInputFormat</B>(<A HREF="../../../../weka/core/Instances.html" title="class in weka.core">Instances</A>&nbsp;instanceInfo)                       throws java.lang.Exception</PRE><DL><DD>Sets the format of the input instances.<P><DD><DL><DT><B>Overrides:</B><DD><CODE><A HREF="../../../../weka/filters/Filter.html#setInputFormat(weka.core.Instances)">setInputFormat</A></CODE> in class <CODE><A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>instanceInfo</CODE> - an Instances object containing the input  instance structure (any instances contained in the object are  ignored - only the structure is required).<DT><B>Returns:</B><DD>true if the outputFormat may be collected immediately<DT><B>Throws:</B><DD><CODE>java.lang.Exception</CODE> - if the input format can't be set  successfully</DL></DD></DL><HR><A NAME="input(weka.core.Instance)"><!-- --></A><H3>input</H3><PRE>public boolean <B>input</B>(<A HREF="../../../../weka/core/Instance.html" title="class in weka.core">Instance</A>&nbsp;instance)              throws java.lang.Exception</PRE><DL><DD>Input an instance for filtering. Filter requires all training instances be read before producing output.<P><DD><DL><DT><B>Overrides:</B><DD><CODE><A HREF="../../../../weka/filters/Filter.html#input(weka.core.Instance)">input</A></CODE> in class <CODE><A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>instance</CODE> - the input instance.<DT><B>Returns:</B><DD>true if the filtered instance may now be collected with output().<DT><B>Throws:</B><DD><CODE>java.lang.IllegalStateException</CODE> - if no input structure has been defined.<DD><CODE>java.lang.NullPointerException</CODE> - if the input format has not been defined.<DD><CODE>java.lang.Exception</CODE> - if the input instance was not of the correct  format or if there was a problem with the filtering.</DL></DD></DL><HR><A NAME="batchFinished()"><!-- --></A><H3>batchFinished</H3><PRE>public boolean <B>batchFinished</B>()                      throws java.lang.Exception</PRE><DL><DD>Signify that this batch of input to the filter is finished.  If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.<P><DD><DL><DT><B>Overrides:</B><DD><CODE><A HREF="../../../../weka/filters/Filter.html#batchFinished()">batchFinished</A></CODE> in class <CODE><A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>true if there are instances pending output.<DT><B>Throws:</B><DD><CODE>java.lang.IllegalStateException</CODE> - if no input structure has been defined.<DD><CODE>java.lang.NullPointerException</CODE> - if no input structure has been defined,<DD><CODE>java.lang.Exception</CODE> - if there was a problem finishing the batch.</DL></DD></DL><HR><A NAME="globalInfo()"><!-- --></A><H3>globalInfo</H3><PRE>public java.lang.String <B>globalInfo</B>()</PRE><DL><DD>Returns a string describing this filter<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>a description of the filter suitable for displaying in the explorer/experimenter gui</DL></DD></DL><HR><A NAME="getOutputWordCounts()"><!-- --></A><H3>getOutputWordCounts</H3><PRE>public boolean <B>getOutputWordCounts</B>()</PRE><DL><DD>Gets whether output instances contain 0 or 1 indicating word presence, or word counts.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>true if word counts should be output.</DL></DD></DL><HR><A NAME="setOutputWordCounts(boolean)"><!-- --></A><H3>setOutputWordCounts</H3><PRE>public void <B>setOutputWordCounts</B>(boolean&nbsp;outputWordCounts)</PRE><DL><DD>Sets whether output instances contain 0 or 1 indicating word presence, or word counts.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>outputWordCounts</CODE> - true if word counts should be output.</DL></DD></DL><HR><A NAME="outputWordCountsTipText()"><!-- --></A><H3>outputWordCountsTipText</H3><PRE>public java.lang.String <B>outputWordCountsTipText</B>()</PRE><DL><DD>Returns the tip text for this property<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>tip text for this property suitable for displaying in the explorer/experimenter gui</DL></DD></DL><HR><A NAME="getDelimiters()"><!-- --></A><H3>getDelimiters</H3><PRE>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -