⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 stopwordfilter.java

📁 wekaUT是 university texas austin 开发的基于weka的半指导学习(semi supervised learning)的分类器
💻 JAVA
📖 第 1 页 / 共 2 页
字号:
package weka.datagenerators;import weka.core.Option;import weka.core.Utils;import java.io.BufferedReader;import java.io.FileReader;import java.io.IOException;import java.util.ArrayList;import java.util.Collection;import java.util.HashSet;import java.util.regex.Matcher;import java.util.regex.Pattern;/** * Tosses words that are found in the stop list.  By default, it * borrows the stop list in the SMART information retrieval system. * The user may define his own stop list, or append stop words to the * built-in list. * * <p><b>WEKA options:</b> * <ul> *   <li><code>-w</code> - Whether input is lowercase only.  If this *   flag is set, then string comparison will be case-sensitive, which *   is somewhat less costly than case-insensitive comparison.  By *   default it is unset. * *   <li><code>-e</code> - Whether the default SMART stop list is *   <b>skipped</b>.  By default it is unset. * *   <li><code>-f &lt;str&gt;[:&lt;str&gt;...]</code> - A *   colon-separated list of stop list files.  In those files, stop *   words are listed in separate lines.  If there are multiple words *   on the same line, then only the first word will be read.  Stop *   words are converted to lowercase before being used.  Leading and *   trailing whitespace and empty lines are ignored.  If *   <code>filter.stop_word.use_default</code> is true, then those *   stop words will be appended to the default SMART stop list.  By *   default the list is empty. * </ul> * * @author ywwong * @version $Id: StopWordFilter.java,v 1.1.1.1 2003/01/22 07:48:27 mbilenko Exp $ */class StopWordFilter implements TokenFilter {    /** The default SMART stop list. */    protected static final String[] m_aDefStopList = {        "a",        "a's",        "able",        "about",        "above",        "according",        "accordingly",        "across",        "actually",        "after",        "afterwards",        "again",        "against",        "ain't",        "all",        "allow",        "allows",        "almost",        "alone",        "along",        "already",        "also",        "although",        "always",        "am",        "among",        "amongst",        "an",        "and",        "another",        "any",        "anybody",        "anyhow",        "anyone",        "anything",        "anyway",        "anyways",        "anywhere",        "apart",        "appear",        "appreciate",        "appropriate",        "are",        "aren't",        "around",        "as",        "aside",        "ask",        "asking",        "associated",        "at",        "available",        "away",        "awfully",        "b",        "be",        "became",        "because",        "become",        "becomes",        "becoming",        "been",        "before",        "beforehand",        "behind",        "being",        "believe",        "below",        "beside",        "besides",        "best",        "better",        "between",        "beyond",        "both",        "brief",        "but",        "by",        "c",        "c'mon",        "c's",        "came",        "can",        "can't",        "cannot",        "cant",        "cause",        "causes",        "certain",        "certainly",        "changes",        "clearly",        "co",        "com",        "come",        "comes",        "concerning",        "consequently",        "consider",        "considering",        "contain",        "containing",        "contains",        "corresponding",        "could",        "couldn't",        "course",        "currently",        "d",        "definitely",        "described",        "despite",        "did",        "didn't",        "different",        "do",        "does",        "doesn't",        "doing",        "don't",        "done",        "down",        "downwards",        "during",        "e",        "each",        "edu",        "eg",        "eight",        "either",        "else",        "elsewhere",        "enough",        "entirely",        "especially",        "et",        "etc",        "even",        "ever",        "every",        "everybody",        "everyone",        "everything",        "everywhere",        "ex",        "exactly",        "example",        "except",        "f",        "far",        "few",        "fifth",        "first",        "five",        "followed",        "following",        "follows",        "for",        "former",        "formerly",        "forth",        "four",        "from",        "further",        "furthermore",        "g",        "get",        "gets",        "getting",        "given",        "gives",        "go",        "goes",        "going",        "gone",        "got",        "gotten",        "greetings",        "h",        "had",        "hadn't",        "happens",        "hardly",        "has",        "hasn't",        "have",        "haven't",        "having",        "he",        "he's",        "hello",        "help",        "hence",        "her",        "here",        "here's",        "hereafter",        "hereby",        "herein",        "hereupon",        "hers",        "herself",        "hi",        "him",        "himself",        "his",        "hither",        "hopefully",        "how",        "howbeit",        "however",        "i",        "i'd",        "i'll",        "i'm",        "i've",        "ie",        "if",        "ignored",        "immediate",        "in",        "inasmuch",        "inc",        "indeed",        "indicate",        "indicated",        "indicates",        "inner",        "insofar",        "instead",        "into",        "inward",        "is",        "isn't",        "it",        "it'd",        "it'll",        "it's",        "its",        "itself",        "j",        "just",        "k",        "keep",        "keeps",        "kept",        "know",        "knows",        "known",        "l",        "last",        "lately",        "later",        "latter",        "latterly",        "least",        "less",        "lest",        "let",        "let's",        "like",        "liked",        "likely",        "little",        "look",        "looking",        "looks",        "ltd",        "m",        "mainly",        "many",        "may",        "maybe",        "me",        "mean",        "meanwhile",        "merely",        "might",        "more",        "moreover",        "most",        "mostly",        "much",        "must",        "my",        "myself",        "n",        "name",        "namely",        "nd",        "near",        "nearly",        "necessary",        "need",        "needs",        "neither",        "never",        "nevertheless",        "new",        "next",        "nine",        "no",        "nobody",        "non",        "none",        "noone",        "nor",        "normally",        "not",        "nothing",        "novel",        "now",        "nowhere",        "o",        "obviously",        "of",        "off",        "often",        "oh",        "ok",        "okay",        "old",        "on",        "once",        "one",        "ones",        "only",        "onto",

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -