📄 stringtowordvector.html
字号:
<A NAME="FILTER_NORMALIZE_ALL"><!-- --></A><H3>FILTER_NORMALIZE_ALL</H3><PRE>public static final int <B>FILTER_NORMALIZE_ALL</B></PRE><DL><DD>normalization: Normalize all data<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#weka.filters.unsupervised.attribute.StringToWordVector.FILTER_NORMALIZE_ALL">Constant Field Values</A></DL></DL><HR><A NAME="FILTER_NORMALIZE_TEST_ONLY"><!-- --></A><H3>FILTER_NORMALIZE_TEST_ONLY</H3><PRE>public static final int <B>FILTER_NORMALIZE_TEST_ONLY</B></PRE><DL><DD>normalization: Normalize test data only<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#weka.filters.unsupervised.attribute.StringToWordVector.FILTER_NORMALIZE_TEST_ONLY">Constant Field Values</A></DL></DL><HR><A NAME="TAGS_FILTER"><!-- --></A><H3>TAGS_FILTER</H3><PRE>public static final <A HREF="../../../../weka/core/Tag.html" title="class in weka.core">Tag</A>[] <B>TAGS_FILTER</B></PRE><DL><DD>Specifies whether document's (instance's) word frequencies are to be normalized. The are normalized to average length of documents specified as input format.<P><DL></DL></DL><!-- ========= CONSTRUCTOR DETAIL ======== --><A NAME="constructor_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Constructor Detail</B></FONT></TH></TR></TABLE><A NAME="StringToWordVector()"><!-- --></A><H3>StringToWordVector</H3><PRE>public <B>StringToWordVector</B>()</PRE><DL><DD>Default constructor. Targets 1000 words in the output.<P></DL><HR><A NAME="StringToWordVector(int)"><!-- --></A><H3>StringToWordVector</H3><PRE>public <B>StringToWordVector</B>(int wordsToKeep)</PRE><DL><DD>Constructor that allows specification of the target number of words in the output.<P><DL><DT><B>Parameters:</B><DD><CODE>wordsToKeep</CODE> - the number of words in the output vector (per class if assigned).</DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="listOptions()"><!-- --></A><H3>listOptions</H3><PRE>public java.util.Enumeration <B>listOptions</B>()</PRE><DL><DD>Returns an enumeration describing the available options<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../weka/core/OptionHandler.html#listOptions()">listOptions</A></CODE> in interface <CODE><A HREF="../../../../weka/core/OptionHandler.html" title="interface in weka.core">OptionHandler</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>an enumeration of all the available options</DL></DD></DL><HR><A NAME="setOptions(java.lang.String[])"><!-- --></A><H3>setOptions</H3><PRE>public void <B>setOptions</B>(java.lang.String[] options) throws java.lang.Exception</PRE><DL><DD>Parses a given list of options. <p/> <!-- options-start --> Valid options are: <p/> <pre> -C Output word counts rather than boolean word presence. </pre> <pre> -D <delimiter set> String containing the set of delimiter characters (default: " \n\t.,:'\"()?!")</pre> <pre> -R <index1,index2-index4,...> Specify list of string attributes to convert to words (as weka Range). (default: select all string attributes)</pre> <pre> -P <attribute name prefix> Specify a prefix for the created attribute names. (default: "")</pre> <pre> -W <number of words to keep> Specify approximate number of word fields to create. Surplus words will be discarded.. (default: 1000)</pre> <pre> -T Transform the word frequencies into log(1+fij) where fij is the frequency of word i in jth document(instance). </pre> <pre> -I Transform each word frequency into: fij*log(num of Documents/num of documents containing word i) where fij if frequency of word i in jth document(instance)</pre> <pre> -N Whether to 0=not normalize/1=normalize all data/2=normalize test data only to average length of training documents (default 0=don't normalize).</pre> <pre> -A Only form tokens from contiguous alphabetic sequences (The delimiter string is ignored if this is set).</pre> <pre> -L Convert all tokens to lowercase before adding to the dictionary.</pre> <pre> -S Ignore words that are in the stoplist.</pre> <pre> -stemmer <spec> The stemmering algorihtm (classname plus parameters) to use.</pre> <pre> -M <int> The minimum term frequency (default = 1).</pre> <pre> -O If this is set, the maximum number of words and the minimum term frequency is not enforced on a per-class basis but based on the documents in all the classes (even if a class attribute is set).</pre> <!-- options-end --><P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../weka/core/OptionHandler.html#setOptions(java.lang.String[])">setOptions</A></CODE> in interface <CODE><A HREF="../../../../weka/core/OptionHandler.html" title="interface in weka.core">OptionHandler</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>options</CODE> - the list of options as an array of strings<DT><B>Throws:</B><DD><CODE>java.lang.Exception</CODE> - if an option is not supported</DL></DD></DL><HR><A NAME="getOptions()"><!-- --></A><H3>getOptions</H3><PRE>public java.lang.String[] <B>getOptions</B>()</PRE><DL><DD>Gets the current settings of the filter.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../weka/core/OptionHandler.html#getOptions()">getOptions</A></CODE> in interface <CODE><A HREF="../../../../weka/core/OptionHandler.html" title="interface in weka.core">OptionHandler</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>an array of strings suitable for passing to setOptions</DL></DD></DL><HR><A NAME="getCapabilities()"><!-- --></A><H3>getCapabilities</H3><PRE>public <A HREF="../../../../weka/core/Capabilities.html" title="class in weka.core">Capabilities</A> <B>getCapabilities</B>()</PRE><DL><DD>Returns the Capabilities of this filter.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../weka/core/CapabilitiesHandler.html#getCapabilities()">getCapabilities</A></CODE> in interface <CODE><A HREF="../../../../weka/core/CapabilitiesHandler.html" title="interface in weka.core">CapabilitiesHandler</A></CODE><DT><B>Overrides:</B><DD><CODE><A HREF="../../../../weka/filters/Filter.html#getCapabilities()">getCapabilities</A></CODE> in class <CODE><A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>the capabilities of this object<DT><B>See Also:</B><DD><A HREF="../../../../weka/core/Capabilities.html" title="class in weka.core"><CODE>Capabilities</CODE></A></DL></DD></DL><HR><A NAME="setInputFormat(weka.core.Instances)"><!-- --></A><H3>setInputFormat</H3><PRE>public boolean <B>setInputFormat</B>(<A HREF="../../../../weka/core/Instances.html" title="class in weka.core">Instances</A> instanceInfo) throws java.lang.Exception</PRE><DL><DD>Sets the format of the input instances.<P><DD><DL><DT><B>Overrides:</B><DD><CODE><A HREF="../../../../weka/filters/Filter.html#setInputFormat(weka.core.Instances)">setInputFormat</A></CODE> in class <CODE><A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>instanceInfo</CODE> - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).<DT><B>Returns:</B><DD>true if the outputFormat may be collected immediately<DT><B>Throws:</B><DD><CODE>java.lang.Exception</CODE> - if the input format can't be set successfully</DL></DD></DL><HR><A NAME="input(weka.core.Instance)"><!-- --></A><H3>input</H3><PRE>public boolean <B>input</B>(<A HREF="../../../../weka/core/Instance.html" title="class in weka.core">Instance</A> instance) throws java.lang.Exception</PRE><DL><DD>Input an instance for filtering. Filter requires all training instances be read before producing output.<P><DD><DL><DT><B>Overrides:</B><DD><CODE><A HREF="../../../../weka/filters/Filter.html#input(weka.core.Instance)">input</A></CODE> in class <CODE><A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A></CODE></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>instance</CODE> - the input instance.<DT><B>Returns:</B><DD>true if the filtered instance may now be collected with output().<DT><B>Throws:</B><DD><CODE>java.lang.IllegalStateException</CODE> - if no input structure has been defined.<DD><CODE>java.lang.NullPointerException</CODE> - if the input format has not been defined.<DD><CODE>java.lang.Exception</CODE> - if the input instance was not of the correct format or if there was a problem with the filtering.</DL></DD></DL><HR><A NAME="batchFinished()"><!-- --></A><H3>batchFinished</H3><PRE>public boolean <B>batchFinished</B>() throws java.lang.Exception</PRE><DL><DD>Signify that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.<P><DD><DL><DT><B>Overrides:</B><DD><CODE><A HREF="../../../../weka/filters/Filter.html#batchFinished()">batchFinished</A></CODE> in class <CODE><A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>true if there are instances pending output.<DT><B>Throws:</B><DD><CODE>java.lang.IllegalStateException</CODE> - if no input structure has been defined.<DD><CODE>java.lang.NullPointerException</CODE> - if no input structure has been defined,<DD><CODE>java.lang.Exception</CODE> - if there was a problem finishing the batch.</DL></DD></DL><HR><A NAME="globalInfo()"><!-- --></A><H3>globalInfo</H3><PRE>public java.lang.String <B>globalInfo</B>()</PRE><DL><DD>Returns a string describing this filter<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>a description of the filter suitable for displaying in the explorer/experimenter gui</DL></DD></DL><HR><A NAME="getOutputWordCounts()"><!-- --></A><H3>getOutputWordCounts</H3><PRE>public boolean <B>getOutputWordCounts</B>()</PRE><DL><DD>Gets whether output instances contain 0 or 1 indicating word presence, or word counts.<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>true if word counts should be output.</DL></DD></DL><HR><A NAME="setOutputWordCounts(boolean)"><!-- --></A><H3>setOutputWordCounts</H3><PRE>public void <B>setOutputWordCounts</B>(boolean outputWordCounts)</PRE><DL><DD>Sets whether output instances contain 0 or 1 indicating word presence, or word counts.<P><DD><DL></DL></DD><DD><DL><DT><B>Parameters:</B><DD><CODE>outputWordCounts</CODE> - true if word counts should be output.</DL></DD></DL><HR><A NAME="outputWordCountsTipText()"><!-- --></A><H3>outputWordCountsTipText</H3><PRE>public java.lang.String <B>outputWordCountsTipText</B>()</PRE><DL><DD>Returns the tip text for this property<P><DD><DL></DL></DD><DD><DL><DT><B>Returns:</B><DD>tip text for this property suitable for displaying in the explorer/experimenter gui</DL></DD></DL><HR><A NAME="getDelimiters()"><!-- --></A><H3>getDelimiters</H3><PRE>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -