📄 classifiers.xml
字号:
<chapter id="classifiers"><title>Classifiers</title><para>This chapter describes (some of) the different classifiers available in <application>Select</application>.</para><para>There are two main types of classifiers in <application>Select</application>: document classifiers and vector classifiers.</para><sect1><title>Document classifiers</title><para>A document classifier is implemented as taking a document as input.</para><para>All document classifiers have type document.<programlisting>type document # Document classifier</programlisting></para><sect2><title>From classifier</title><para>Classifies according to sender.</para><sect3><title>Options</title><variablelist><varlistentry><term><option>n</option></term><listitem><simpara>Specifies the maximum number of addresses to save.</simpara></listitem></varlistentry><varlistentry><term><option>o</option></term><listitem><simpara>Specifies the address eviction order.</simpara></listitem></varlistentry></variablelist></sect3><sect3><title>Example</title><para><programlisting>[classifier]name from # Name the classifier fromclassifier From # From classifiertype document # Document classifieroptions n=100,o=fifo # 100 addresses, fifo order</programlisting></para></sect3></sect2><sect2><title>Reply classifier</title><para>Classifies according to threads.</para><sect3><title>options</title><variablelist><varlistentry><term><option>n</option></term><listitem><simpara>Specifies the maximum number of threads to save.</simpara></listitem></varlistentry></variablelist></sect3><sect3><title>Example</title><para><programlisting>[classifier]name reply # Name the classifier replyclassifier Reply # Reply classifiertype document # Document classifieroptions n=100 # 100 subject entries</programlisting></para></sect3></sect2></sect1><sect1><title>Vector classifiers</title><para>A vector classifier is implemented as taking a vector as input.</para><para>Vector classifiers can have different type arguments.A multi class classifier should have type multi_one:<programlisting>type multi_one # Multi classifier, type ONE_MAX</programlisting>A binary classifier can have one of several types:<programlisting>type multi_rest # Multi classifier, type REST_MAXtype multi_linmax # Multi classifier, type LIN_MAXtype multi_uc # Multi classifier, type UC_MAX</programlisting></para><sect2><title>Alma</title><para>Alma is a binary maximal margin classifier.See <link linkend="Gen01"><citation>Gen01</citation></link> for a description of it.</para><sect3><title>options</title><para>There are no options.</para></sect3><sect3><title>Example</title><para><programlisting>[classifier]name alma # Name the classifier almaclassifier Alma # Alma classifiertype multi_linmax # Multi classifier, type LIN_MAXoptions # No optionstokenizer alpha # Alpha tokenizervectorizer tfidf # TF-IDF vectorizernormalizer # No normalization</programlisting></para></sect3></sect2><sect2><title>Naive Bayes</title><para>Naive Bayes is a simple probabilistic multi class classifier.See <link linkend="McCNig98"><citation>McCNig98</citation></link> for a description of it.</para><para>Should only be used with type multi_one.</para><sect3><title>options</title><para>There are no options.</para></sect3><sect3><title>Example</title><para><programlisting>[classifier]name nb # Name the classifier nbclassifier NaiveBayes # NaiveBayes classifiertype multi_one # Multi classifier, type ONE_MAXoptions # No optionstokenizer alpha # Alpha tokenizervectorizer tfidf # TF-IDF vectorizernormalizer # No normalization</programlisting></para></sect3></sect2><sect2><title>N-gram</title><para>N-gram is a classifier which uses relative entropy.It is suitable to use for language identification.See <link linkend="SibRey96"><citation>SibRey96</citation></link> for a description of it.</para><para>Should only be used with an n-gram tokenizer and type multi_one.</para><sect3><title>options</title><para>There are no options.</para></sect3><sect3><title>Example</title><para><programlisting>[classifier]name ng # Name the classifier ngclassifier N-gram # N-gram classifiertype multi_one # Multi classifier, type ONE_MAXoptions # No optionstokenizer ngram.byte # N-gram byte tokenizervectorizer tf # TF vectorizernormalizer # No normalization</programlisting></para></sect3></sect2><sect2><title>Perceptron</title><para>Perceptron is an old, simple binary classifier.It is described in just about every textbook on machine learning.</para><para>It can be used with type multi_rest, multi_linmax, multi_uc.</para><sect3><title>options</title><para>There are no options.</para></sect3><sect3><title>Example</title><para><programlisting>[classifier]name per # Name the classifier perclassifier Perceptron # Perceptron classifiertype multi_linmax # Multi classifier, type LIN_MAXoptions # No optionstokenizer alpha # Alpha tokenizervectorizer tfidf # TF-IDF vectorizernormalizer # No normalization</programlisting></para></sect3></sect2><sect2 id="trivial"><title>Trivial classifier</title><para>Classifies classifies either according to class frequency or at random.Needless to say, this is only useful for testing purposes and should not be used in practice.</para><para>Should only be used with a null tokenizer and type multi_one.</para><sect3><title>Options</title><variablelist><varlistentry><term><option>s</option></term><listitem><simpara>Seed (!= 0) for the pseudo random number generator. Sets the classifier to random mode.</simpara></listitem></varlistentry></variablelist></sect3><sect3><title>Example</title><para><programlisting>[classifier]name triv # Name the classifier trivclassifier Trivial # Trivial classifiertype multi_one # Multi classifier, type ONE_MAXoptions s=123 # Random mode, with seed 123tokenizer null # Null tokenizervectorizer # Default vectorizernormalizer # No normalization</programlisting></para></sect3></sect2></sect1></chapter>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -