stringtowordvector.html
来自「数据挖掘的最常用工具。由于开源」· HTML 代码 · 共 1,646 行 · 第 1/5 页
HTML
1,646 行
<CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setPeriodicPruning(double)">setPeriodicPruning</A></B>(double newPeriodicPruning)</CODE><BR> Sets the rate at which the dictionary is periodically pruned, as a percentage of the dataset size.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setSelectedRange(java.lang.String)">setSelectedRange</A></B>(java.lang.String newSelectedRange)</CODE><BR> Set the value of m_SelectedRange.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setStemmer(weka.core.stemmers.Stemmer)">setStemmer</A></B>(<A HREF="../../../../weka/core/stemmers/Stemmer.html" title="interface in weka.core.stemmers">Stemmer</A> value)</CODE><BR> the stemming algorithm to use, null means no stemming at all (i.e., the NullStemmer is used).</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setStopwords(java.io.File)">setStopwords</A></B>(java.io.File value)</CODE><BR> sets the file containing the stopwords, null or a directory unset the stopwords.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setTFTransform(boolean)">setTFTransform</A></B>(boolean TFTransform)</CODE><BR> Sets whether if the word frequencies should be transformed into log(1+fij) where fij is the frequency of word i in document(instance) j.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setTokenizer(weka.core.tokenizers.Tokenizer)">setTokenizer</A></B>(<A HREF="../../../../weka/core/tokenizers/Tokenizer.html" title="class in weka.core.tokenizers">Tokenizer</A> value)</CODE><BR> the tokenizer algorithm to use.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setUseStoplist(boolean)">setUseStoplist</A></B>(boolean useStoplist)</CODE><BR> Sets whether if the words that are on a stoplist are to be ignored (The stop list is in weka.core.StopWords).</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setWordsToKeep(int)">setWordsToKeep</A></B>(int newWordsToKeep)</CODE><BR> Sets the number of words (per class if there is a class attribute assigned) to attempt to keep.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#stemmerTipText()">stemmerTipText</A></B>()</CODE><BR> Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#stopwordsTipText()">stopwordsTipText</A></B>()</CODE><BR> Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#TFTransformTipText()">TFTransformTipText</A></B>()</CODE><BR> Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#tokenizerTipText()">tokenizerTipText</A></B>()</CODE><BR> Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#useStoplistTipText()">useStoplistTipText</A></B>()</CODE><BR> Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE> java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#wordsToKeepTipText()">wordsToKeepTipText</A></B>()</CODE><BR> Returns the tip text for this property.</TD></TR></TABLE> <A NAME="methods_inherited_from_class_weka.filters.Filter"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class weka.filters.<A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../weka/filters/Filter.html#batchFilterFile(weka.filters.Filter, java.lang.String[])">batchFilterFile</A>, <A HREF="../../../../weka/filters/Filter.html#filterFile(weka.filters.Filter, java.lang.String[])">filterFile</A>, <A HREF="../../../../weka/filters/Filter.html#getCapabilities(weka.core.Instances)">getCapabilities</A>, <A HREF="../../../../weka/filters/Filter.html#getOutputFormat()">getOutputFormat</A>, <A HREF="../../../../weka/filters/Filter.html#isFirstBatchDone()">isFirstBatchDone</A>, <A HREF="../../../../weka/filters/Filter.html#isNewBatch()">isNewBatch</A>, <A HREF="../../../../weka/filters/Filter.html#isOutputFormatDefined()">isOutputFormatDefined</A>, <A HREF="../../../../weka/filters/Filter.html#makeCopies(weka.filters.Filter, int)">makeCopies</A>, <A HREF="../../../../weka/filters/Filter.html#makeCopy(weka.filters.Filter)">makeCopy</A>, <A HREF="../../../../weka/filters/Filter.html#numPendingOutput()">numPendingOutput</A>, <A HREF="../../../../weka/filters/Filter.html#output()">output</A>, <A HREF="../../../../weka/filters/Filter.html#outputPeek()">outputPeek</A>, <A HREF="../../../../weka/filters/Filter.html#toString()">toString</A>, <A HREF="../../../../weka/filters/Filter.html#useFilter(weka.core.Instances, weka.filters.Filter)">useFilter</A>, <A HREF="../../../../weka/filters/Filter.html#wekaStaticWrapper(weka.filters.Sourcable, java.lang.String, weka.core.Instances, weka.core.Instances)">wekaStaticWrapper</A></CODE></TD></TR></TABLE> <A NAME="methods_inherited_from_class_java.lang.Object"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class java.lang.Object</B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE>equals, getClass, hashCode, notify, notifyAll, wait, wait, wait</CODE></TD></TR></TABLE> <P><!-- ============ FIELD DETAIL =========== --><A NAME="field_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Field Detail</B></FONT></TH></TR></TABLE><A NAME="FILTER_NONE"><!-- --></A><H3>FILTER_NONE</H3><PRE>public static final int <B>FILTER_NONE</B></PRE><DL><DD>normalization: No normalization.<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#weka.filters.unsupervised.attribute.StringToWordVector.FILTER_NONE">Constant Field Values</A></DL></DL><HR><A NAME="FILTER_NORMALIZE_ALL"><!-- --></A><H3>FILTER_NORMALIZE_ALL</H3><PRE>public static final int <B>FILTER_NORMALIZE_ALL</B></PRE><DL><DD>normalization: Normalize all data.<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#weka.filters.unsupervised.attribute.StringToWordVector.FILTER_NORMALIZE_ALL">Constant Field Values</A></DL></DL><HR><A NAME="FILTER_NORMALIZE_TEST_ONLY"><!-- --></A><H3>FILTER_NORMALIZE_TEST_ONLY</H3><PRE>public static final int <B>FILTER_NORMALIZE_TEST_ONLY</B></PRE><DL><DD>normalization: Normalize test data only.<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#weka.filters.unsupervised.attribute.StringToWordVector.FILTER_NORMALIZE_TEST_ONLY">Constant Field Values</A></DL></DL><HR><A NAME="TAGS_FILTER"><!-- --></A><H3>TAGS_FILTER</H3><PRE>public static final <A HREF="../../../../weka/core/Tag.html" title="class in weka.core">Tag</A>[] <B>TAGS_FILTER</B></PRE><DL><DD>Specifies whether document's (instance's) word frequencies are to be normalized. The are normalized to average length of documents specified as input format.<P><DL></DL></DL><!-- ========= CONSTRUCTOR DETAIL ======== --><A NAME="constructor_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Constructor Detail</B></FONT></TH></TR></TABLE><A NAME="StringToWordVector()"><!-- --></A><H3>StringToWordVector</H3><PRE>public <B>StringToWordVector</B>()</PRE><DL><DD>Default constructor. Targets 1000 words in the output.<P></DL><HR><A NAME="StringToWordVector(int)"><!-- --></A><H3>StringToWordVector</H3><PRE>public <B>StringToWordVector</B>(int wordsToKeep)</PRE><DL><DD>Constructor that allows specification of the target number of words in the output.<P><DL><DT><B>Parameters:</B><DD><CODE>wordsToKeep</CODE> - the number of words in the output vector (per class if assigned).</DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="listOptions()"><!-- --></A><H3>listOptions</H3><PRE>public java.util.Enumeration <B>listOptions</B>()</PRE><DL><DD>Returns an enumeration describing the available options.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../weka/core/OptionHandler.html#listOptions()">listOptions</A></CODE> in interface <CODE><A HREF="../../../../weka/core/OptionHandler.html" title="interface in weka.core">OptionHandler</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>an enumeration of all the available options</DL></DD></DL><HR><A NAME="setOptions(java.lang.String[])"><!-- --></A><H3>setOptions</H3><PRE>public void <B>setOptions</B>(java.lang.String[] options) throws java.lang.Exception</PRE><DL><DD>Parses a given list of options. <p/> <!-- options-start --> Valid options are: <p/> <pre> -C Output word counts rather than boolean word presence. </pre> <pre> -R <index1,index2-index4,...> Specify list of string attributes to convert to words (as weka Range). (default: select all string attributes)</pre> <pre> -V Invert matching sense of column indexes.</pre> <pre> -P <attribute name prefix> Specify a prefix for the created attribute names. (default: "")</pre> <pre> -W <number of words to keep> Specify approximate number of word fields to create. Surplus words will be discarded.. (default: 1000)</pre> <pre> -prune-rate <rate as a percentage of dataset> Specify the rate (e.g., every 10% of the input dataset) at which to periodically prune the dictionary. -W prunes after creating a full dictionary. You may not have enough memory for this approach. (default: no periodic pruning)</pre> <pre> -T Transform the word frequencies into log(1+fij) where fij is the frequency of word i in jth document(instance). </pre> <pre> -I Transform each word frequency into: fij*log(num of Documents/num of documents containing word i) where fij if frequency of word i in jth document(instance)</pre> <pre> -N Whether to 0=not normalize/1=normalize all data/2=normalize test data only to average length of training documents (default 0=don't normalize).</pre> <pre> -L Convert all tokens to lowercase before adding to the dictionary.</pre> <pre> -S Ignore words that are in the stoplist.</pre> <pre> -stemmer <spec> The stemmering algorihtm (classname plus parameters) to use.</pre> <pre> -M <int> The minimum term frequency (default = 1).</pre> <pre> -O If this is set, the maximum number of words and the minimum term frequency is not enforced on a per-class basis but based on the documents in all the classes (even if a class attribute is set).</pre> <pre> -stopwords <file> A file containing stopwords to override the default ones. Using this option automatically sets the flag ('-S') to use the stoplist if the file exists. Format: one stopword per line, lines starting with '#'
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?