stringtowordvector.html

来自「数据挖掘的最常用工具。由于开源」· HTML 代码 · 共 1,646 行 · 第 1/5 页
HTML
1,646 行
<CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setPeriodicPruning(double)">setPeriodicPruning</A></B>(double&nbsp;newPeriodicPruning)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Sets the rate at which the dictionary is periodically pruned, as a  percentage of the dataset size.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setSelectedRange(java.lang.String)">setSelectedRange</A></B>(java.lang.String&nbsp;newSelectedRange)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Set the value of m_SelectedRange.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setStemmer(weka.core.stemmers.Stemmer)">setStemmer</A></B>(<A HREF="../../../../weka/core/stemmers/Stemmer.html" title="interface in weka.core.stemmers">Stemmer</A>&nbsp;value)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;the stemming algorithm to use, null means no stemming at all (i.e., the NullStemmer is used).</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setStopwords(java.io.File)">setStopwords</A></B>(java.io.File&nbsp;value)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sets the file containing the stopwords, null or a directory unset the stopwords.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setTFTransform(boolean)">setTFTransform</A></B>(boolean&nbsp;TFTransform)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Sets whether if the word frequencies should be transformed into  log(1+fij) where fij is the frequency of word i in document(instance) j.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setTokenizer(weka.core.tokenizers.Tokenizer)">setTokenizer</A></B>(<A HREF="../../../../weka/core/tokenizers/Tokenizer.html" title="class in weka.core.tokenizers">Tokenizer</A>&nbsp;value)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;the tokenizer algorithm to use.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setUseStoplist(boolean)">setUseStoplist</A></B>(boolean&nbsp;useStoplist)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Sets whether if the words that are on a stoplist are to be ignored (The stop list is in weka.core.StopWords).</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;void</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#setWordsToKeep(int)">setWordsToKeep</A></B>(int&nbsp;newWordsToKeep)</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Sets the number of words (per class if there is a class attribute assigned) to attempt to keep.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#stemmerTipText()">stemmerTipText</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#stopwordsTipText()">stopwordsTipText</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#TFTransformTipText()">TFTransformTipText</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#tokenizerTipText()">tokenizerTipText</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#useStoplistTipText()">useStoplistTipText</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns the tip text for this property.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD ALIGN="right" VALIGN="top" WIDTH="1%"><FONT SIZE="-1"><CODE>&nbsp;java.lang.String</CODE></FONT></TD><TD><CODE><B><A HREF="../../../../weka/filters/unsupervised/attribute/StringToWordVector.html#wordsToKeepTipText()">wordsToKeepTipText</A></B>()</CODE><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns the tip text for this property.</TD></TR></TABLE>&nbsp;<A NAME="methods_inherited_from_class_weka.filters.Filter"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class weka.filters.<A HREF="../../../../weka/filters/Filter.html" title="class in weka.filters">Filter</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../weka/filters/Filter.html#batchFilterFile(weka.filters.Filter, java.lang.String[])">batchFilterFile</A>, <A HREF="../../../../weka/filters/Filter.html#filterFile(weka.filters.Filter, java.lang.String[])">filterFile</A>, <A HREF="../../../../weka/filters/Filter.html#getCapabilities(weka.core.Instances)">getCapabilities</A>, <A HREF="../../../../weka/filters/Filter.html#getOutputFormat()">getOutputFormat</A>, <A HREF="../../../../weka/filters/Filter.html#isFirstBatchDone()">isFirstBatchDone</A>, <A HREF="../../../../weka/filters/Filter.html#isNewBatch()">isNewBatch</A>, <A HREF="../../../../weka/filters/Filter.html#isOutputFormatDefined()">isOutputFormatDefined</A>, <A HREF="../../../../weka/filters/Filter.html#makeCopies(weka.filters.Filter, int)">makeCopies</A>, <A HREF="../../../../weka/filters/Filter.html#makeCopy(weka.filters.Filter)">makeCopy</A>, <A HREF="../../../../weka/filters/Filter.html#numPendingOutput()">numPendingOutput</A>, <A HREF="../../../../weka/filters/Filter.html#output()">output</A>, <A HREF="../../../../weka/filters/Filter.html#outputPeek()">outputPeek</A>, <A HREF="../../../../weka/filters/Filter.html#toString()">toString</A>, <A HREF="../../../../weka/filters/Filter.html#useFilter(weka.core.Instances, weka.filters.Filter)">useFilter</A>, <A HREF="../../../../weka/filters/Filter.html#wekaStaticWrapper(weka.filters.Sourcable, java.lang.String, weka.core.Instances, weka.core.Instances)">wekaStaticWrapper</A></CODE></TD></TR></TABLE>&nbsp;<A NAME="methods_inherited_from_class_java.lang.Object"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Methods inherited from class java.lang.Object</B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE>equals, getClass, hashCode, notify, notifyAll, wait, wait, wait</CODE></TD></TR></TABLE>&nbsp;<P><!-- ============ FIELD DETAIL =========== --><A NAME="field_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Field Detail</B></FONT></TH></TR></TABLE><A NAME="FILTER_NONE"><!-- --></A><H3>FILTER_NONE</H3><PRE>public static final int <B>FILTER_NONE</B></PRE><DL><DD>normalization: No normalization.<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#weka.filters.unsupervised.attribute.StringToWordVector.FILTER_NONE">Constant Field Values</A></DL></DL><HR><A NAME="FILTER_NORMALIZE_ALL"><!-- --></A><H3>FILTER_NORMALIZE_ALL</H3><PRE>public static final int <B>FILTER_NORMALIZE_ALL</B></PRE><DL><DD>normalization: Normalize all data.<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#weka.filters.unsupervised.attribute.StringToWordVector.FILTER_NORMALIZE_ALL">Constant Field Values</A></DL></DL><HR><A NAME="FILTER_NORMALIZE_TEST_ONLY"><!-- --></A><H3>FILTER_NORMALIZE_TEST_ONLY</H3><PRE>public static final int <B>FILTER_NORMALIZE_TEST_ONLY</B></PRE><DL><DD>normalization: Normalize test data only.<P><DL><DT><B>See Also:</B><DD><A HREF="../../../../constant-values.html#weka.filters.unsupervised.attribute.StringToWordVector.FILTER_NORMALIZE_TEST_ONLY">Constant Field Values</A></DL></DL><HR><A NAME="TAGS_FILTER"><!-- --></A><H3>TAGS_FILTER</H3><PRE>public static final <A HREF="../../../../weka/core/Tag.html" title="class in weka.core">Tag</A>[] <B>TAGS_FILTER</B></PRE><DL><DD>Specifies whether document's (instance's) word frequencies are to be normalized.  The are normalized to average length of documents specified as input format.<P><DL></DL></DL><!-- ========= CONSTRUCTOR DETAIL ======== --><A NAME="constructor_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Constructor Detail</B></FONT></TH></TR></TABLE><A NAME="StringToWordVector()"><!-- --></A><H3>StringToWordVector</H3><PRE>public <B>StringToWordVector</B>()</PRE><DL><DD>Default constructor. Targets 1000 words in the output.<P></DL><HR><A NAME="StringToWordVector(int)"><!-- --></A><H3>StringToWordVector</H3><PRE>public <B>StringToWordVector</B>(int&nbsp;wordsToKeep)</PRE><DL><DD>Constructor that allows specification of the target number of words in the output.<P><DL><DT><B>Parameters:</B><DD><CODE>wordsToKeep</CODE> - the number of words in the output vector (per class if assigned).</DL></DL><!-- ============ METHOD DETAIL ========== --><A NAME="method_detail"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="1"><FONT SIZE="+2"><B>Method Detail</B></FONT></TH></TR></TABLE><A NAME="listOptions()"><!-- --></A><H3>listOptions</H3><PRE>public java.util.Enumeration <B>listOptions</B>()</PRE><DL><DD>Returns an enumeration describing the available options.<P><DD><DL><DT><B>Specified by:</B><DD><CODE><A HREF="../../../../weka/core/OptionHandler.html#listOptions()">listOptions</A></CODE> in interface <CODE><A HREF="../../../../weka/core/OptionHandler.html" title="interface in weka.core">OptionHandler</A></CODE></DL></DD><DD><DL><DT><B>Returns:</B><DD>an enumeration of all the available options</DL></DD></DL><HR><A NAME="setOptions(java.lang.String[])"><!-- --></A><H3>setOptions</H3><PRE>public void <B>setOptions</B>(java.lang.String[]&nbsp;options)                throws java.lang.Exception</PRE><DL><DD>Parses a given list of options. <p/>          <!-- options-start --> Valid options are: <p/>  <pre> -C  Output word counts rather than boolean word presence. </pre>  <pre> -R &lt;index1,index2-index4,...&gt;  Specify list of string attributes to convert to words (as weka Range).  (default: select all string attributes)</pre>  <pre> -V  Invert matching sense of column indexes.</pre>  <pre> -P &lt;attribute name prefix&gt;  Specify a prefix for the created attribute names.  (default: "")</pre>  <pre> -W &lt;number of words to keep&gt;  Specify approximate number of word fields to create.  Surplus words will be discarded..  (default: 1000)</pre>  <pre> -prune-rate &lt;rate as a percentage of dataset&gt;  Specify the rate (e.g., every 10% of the input dataset) at which to periodically prune the dictionary.  -W prunes after creating a full dictionary. You may not have enough memory for this approach.  (default: no periodic pruning)</pre>  <pre> -T  Transform the word frequencies into log(1+fij)  where fij is the frequency of word i in jth document(instance). </pre>  <pre> -I  Transform each word frequency into:  fij*log(num of Documents/num of documents containing word i)    where fij if frequency of word i in jth document(instance)</pre>  <pre> -N  Whether to 0=not normalize/1=normalize all data/2=normalize test data only  to average length of training documents (default 0=don't normalize).</pre>  <pre> -L  Convert all tokens to lowercase before adding to the dictionary.</pre>  <pre> -S  Ignore words that are in the stoplist.</pre>  <pre> -stemmer &lt;spec&gt;  The stemmering algorihtm (classname plus parameters) to use.</pre>  <pre> -M &lt;int&gt;  The minimum term frequency (default = 1).</pre>  <pre> -O  If this is set, the maximum number of words and the   minimum term frequency is not enforced on a per-class   basis but based on the documents in all the classes   (even if a class attribute is set).</pre>  <pre> -stopwords &lt;file&gt;  A file containing stopwords to override the default ones.  Using this option automatically sets the flag ('-S') to use the  stoplist if the file exists.  Format: one stopword per line, lines starting with '#'
stringtowordvector.html - 源码说明

本页面展示了「数据挖掘的最常用工具。由于开源」中的 stringtowordvector.html 源码文件，采用 HTML 编程语言编写，共 1,646 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫开发者社区收录了大量与数据挖掘相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?