racesearch.java

来自「这是关于数据挖掘的一些算法」· Java 代码 · 共 1,770 行 · 第 1/5 页
JAVA
1,770 行
/* *    This program is free software; you can redistribute it and/or modify *    it under the terms of the GNU General Public License as published by *    the Free Software Foundation; either version 2 of the License, or *    (at your option) any later version. * *    This program is distributed in the hope that it will be useful, *    but WITHOUT ANY WARRANTY; without even the implied warranty of *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the *    GNU General Public License for more details. * *    You should have received a copy of the GNU General Public License *    along with this program; if not, write to the Free Software *    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. *//* *    RaceSearch.java *    Copyright (C) 2000 University of Waikato, Hamilton, New Zealand * */package weka.attributeSelection;import weka.core.Instance;import weka.core.Instances;import weka.core.Option;import weka.core.OptionHandler;import weka.core.SelectedTag;import weka.core.Statistics;import weka.core.Tag;import weka.core.TechnicalInformation;import weka.core.TechnicalInformation.Type;import weka.core.TechnicalInformation.Field;import weka.core.TechnicalInformationHandler;import weka.core.Utils;import weka.experiment.PairedStats;import weka.experiment.Stats;import java.util.BitSet;import java.util.Enumeration;import java.util.Random;import java.util.Vector;/**  <!-- globalinfo-start --> * Races the cross validation error of competing attribute subsets. Use in conjuction with a ClassifierSubsetEval. RaceSearch has four modes:<br/> * <br/> * forward selection races all single attribute additions to a base set (initially  no attributes), selects the winner to become the new base set and then iterates until there is no improvement over the base set. <br/> * <br/> * Backward elimination is similar but the initial base set has all attributes included and races all single attribute deletions. <br/> * <br/> * Schemata search is a bit different. Each iteration a series of races are run in parallel. Each race in a set determines whether a particular attribute should be included or not---ie the race is between the attribute being "in" or "out". The other attributes for this race are included or excluded randomly at each point in the evaluation. As soon as one race has a clear winner (ie it has been decided whether a particular attribute should be inor not) then the next set of races begins, using the result of the winning race from the previous iteration as new base set.<br/> * <br/> * Rank race first ranks the attributes using an attribute evaluator and then races the ranking. The race includes no attributes, the top ranked attribute, the top two attributes, the top three attributes, etc.<br/> * <br/> * It is also possible to generate a raked list of attributes through the forward racing process. If generateRanking is set to true then a complete forward race will be run---that is, racing continues until all attributes have been selected. The order that they are added in determines a complete ranking of all the attributes.<br/> * <br/> * Racing uses paired and unpaired t-tests on cross-validation errors of competing subsets. When there is a significant difference between the means of the errors of two competing subsets then the poorer of the two can be eliminated from the race. Similarly, if there is no significant difference between the mean errors of two competing subsets and they are within some threshold of each other, then one can be eliminated from the race.<br/> * <br/> * For more information see:<br/> * <br/> * Andrew W. Moore, Mary S. Lee: Efficient Algorithms for Minimizing Cross Validation Error. In: Eleventh International Conference on Machine Learning, 190-198, 1994. * <p/> <!-- globalinfo-end --> * <!-- technical-bibtex-start --> * BibTeX: * <pre> * &#64;inproceedings{Moore1994, *    author = {Andrew W. Moore and Mary S. Lee}, *    booktitle = {Eleventh International Conference on Machine Learning}, *    pages = {190-198}, *    publisher = {Morgan Kaufmann}, *    title = {Efficient Algorithms for Minimizing Cross Validation Error}, *    year = {1994} * } * </pre> * <p/> <!-- technical-bibtex-end --> * <!-- options-start --> * Valid options are: <p/> *  * <pre> -R &lt;0 = forward | 1 = backward race | 2 = schemata | 3 = rank&gt; *  Type of race to perform. *  (default = 0).</pre> *  * <pre> -L &lt;significance&gt; *  Significance level for comaparisons *  (default = 0.001(forward/backward/rank)/0.01(schemata)).</pre> *  * <pre> -T &lt;threshold&gt; *  Threshold for error comparison. *  (default = 0.001).</pre> *  * <pre> -A &lt;attribute evaluator&gt; *  Attribute ranker to use if doing a  *  rank search. Place any *  evaluator options LAST on  *  the command line following a "--". *  eg. -A weka.attributeSelection.GainRatioAttributeEval ... -- -M. *  (default = GainRatioAttributeEval)</pre> *  * <pre> -F &lt;0 = 10 fold | 1 = leave-one-out&gt; *  Folds for cross validation *  (default = 0 (1 if schemata race)</pre> *  * <pre> -Q *  Generate a ranked list of attributes. *  Forces the search to be forward *  and races until all attributes have *  selected, thus producing a ranking.</pre> *  * <pre> -N &lt;num to select&gt; *  Specify number of attributes to retain from  *  the ranking. Overides -T. Use in conjunction with -Q</pre> *  * <pre> -J &lt;threshold&gt; *  Specify a theshold by which attributes *  may be discarded from the ranking. *  Use in conjuction with -Q</pre> *  * <pre> -Z *  Verbose output for monitoring the search.</pre> *  * <pre>  * Options specific to evaluator weka.attributeSelection.GainRatioAttributeEval: * </pre> *  * <pre> -M *  treat missing values as a seperate value.</pre> *  <!-- options-end --> * * @author Mark Hall (mhall@cs.waikato.ac.nz) * @version $Revision: 1.24 $ */public class RaceSearch   extends ASSearch   implements RankedOutputSearch, OptionHandler, TechnicalInformationHandler {  /** for serialization */  static final long serialVersionUID = 4015453851212985720L;  /** the training instances */  private Instances m_Instances = null;  /** search types */  private static final int FORWARD_RACE = 0;  private static final int BACKWARD_RACE = 1;  private static final int SCHEMATA_RACE = 2;  private static final int RANK_RACE = 3;  public static final Tag [] TAGS_SELECTION = {    new Tag(FORWARD_RACE, "Forward selection race"),    new Tag(BACKWARD_RACE, "Backward elimination race"),    new Tag(SCHEMATA_RACE, "Schemata race"),    new Tag(RANK_RACE, "Rank race")      };    /** the selected search type */  private int m_raceType = FORWARD_RACE;    /** xval types */  private static final int TEN_FOLD = 0;  private static final int LEAVE_ONE_OUT = 1;  public static final Tag [] XVALTAGS_SELECTION = {    new Tag(TEN_FOLD, "10 Fold"),    new Tag(LEAVE_ONE_OUT, "Leave-one-out"),      };  /** the selected xval type */  private int m_xvalType = TEN_FOLD;    /** the class index */  private int m_classIndex;  /** the number of attributes in the data */  private int m_numAttribs;  /** the total number of partially/fully evaluated subsets */  private int m_totalEvals;  /** holds the merit of the best subset found */  private double m_bestMerit = -Double.MAX_VALUE;  /** the subset evaluator to use */  private HoldOutSubsetEvaluator m_theEvaluator = null;  /** the significance level for comparisons */  private double m_sigLevel = 0.001;  /** threshold for comparisons */  private double m_delta = 0.001;  /** the number of samples above which to begin testing for similarity      between competing subsets */  private int m_samples = 20;  /** number of cross validation folds---equal to the number of instances      for leave-one-out cv */  private int m_numFolds = 10;  /** the attribute evaluator to generate the initial ranking when      doing a rank race */  private ASEvaluation m_ASEval = new GainRatioAttributeEval();  /** will hold the attribute ranking produced by the above attribute      evaluator if doing a rank search */  private int [] m_Ranking;  /** verbose output for monitoring the search and debugging */  private boolean m_debug = false;  /** If true then produce a ranked list of attributes by fully traversing      a forward hillclimb race */  private boolean m_rankingRequested = false;  /** The ranked list of attributes produced if m_rankingRequested is true */  private double [][] m_rankedAtts;  /** The number of attributes ranked so far (if ranking is requested) */  private int m_rankedSoFar;  /** The number of attributes to retain if a ranking is requested. -1      indicates that all attributes are to be retained. Has precedence over      m_threshold */  private int m_numToSelect = -1;  private int m_calculatedNumToSelect = -1;  /** the threshold for removing attributes if ranking is requested */  private double m_threshold = -Double.MAX_VALUE;  /**   * Returns a string describing this search method   * @return a description of the search method suitable for   * displaying in the explorer/experimenter gui   */  public String globalInfo() {    return "Races the cross validation error of competing "      +"attribute subsets. Use in conjuction with a ClassifierSubsetEval. "      +"RaceSearch has four modes:\n\nforward selection "      +"races all single attribute additions to a base set (initially "      +" no attributes), selects the winner to become the new base set "      +"and then iterates until there is no improvement over the base set. "      +"\n\nBackward elimination is similar but the initial base set has all "      +"attributes included and races all single attribute deletions. "      +"\n\nSchemata search is a bit different. Each iteration a series of "      +"races are run in parallel. Each race in a set determines whether "      +"a particular attribute should be included or not---ie the race is "      +"between the attribute being \"in\" or \"out\". The other attributes "      +"for this race are included or excluded randomly at each point in the "      +"evaluation. As soon as one race "      +"has a clear winner (ie it has been decided whether a particular "      +"attribute should be inor not) then the next set of races begins, "      +"using the result of the winning race from the previous iteration as "      +"new base set.\n\nRank race first ranks the attributes using an "      +"attribute evaluator and then races the ranking. The race includes "      +"no attributes, the top ranked attribute, the top two attributes, the "      +"top three attributes, etc.\n\nIt is also possible to generate a "      +"raked list of attributes through the forward racing process. "      +"If generateRanking is set to true then a complete forward race will "      +"be run---that is, racing continues until all attributes have been "      +"selected. The order that they are added in determines a complete "      +"ranking of all the attributes.\n\nRacing uses paired and unpaired "      +"t-tests on cross-validation errors of competing subsets. When there "      +"is a significant difference between the means of the errors of two "      +"competing subsets then the poorer of the two can be eliminated from "      +"the race. Similarly, if there is no significant difference between "      +"the mean errors of two competing subsets and they are within some "      +"threshold of each other, then one can be eliminated from the race.\n\n"      + "For more information see:\n\n"      + getTechnicalInformation().toString();  }  /**   * Returns an instance of a TechnicalInformation object, containing    * detailed information about the technical background of this class,   * e.g., paper reference or book this class is based on.   *    * @return the technical information about this class   */  public TechnicalInformation getTechnicalInformation() {    TechnicalInformation        result;        result = new TechnicalInformation(Type.INPROCEEDINGS);    result.setValue(Field.AUTHOR, "Andrew W. Moore and Mary S. Lee");    result.setValue(Field.TITLE, "Efficient Algorithms for Minimizing Cross Validation Error");    result.setValue(Field.BOOKTITLE, "Eleventh International Conference on Machine Learning");    result.setValue(Field.YEAR, "1994");    result.setValue(Field.PAGES, "190-198");    result.setValue(Field.PUBLISHER, "Morgan Kaufmann");        return result;  }  /**   * Returns the tip text for this property   * @return tip text for this property suitable for   * displaying in the explorer/experimenter gui   */  public String raceTypeTipText() {    return "Set the type of search.";  }  /**   * Set the race type   *   * @param d the type of race   */  public void setRaceType (SelectedTag d) {        if (d.getTags() == TAGS_SELECTION) {      m_raceType = d.getSelectedTag().getID();    }    if (m_raceType == SCHEMATA_RACE && !m_rankingRequested) {      try {        setFoldsType(new SelectedTag(LEAVE_ONE_OUT,                                     XVALTAGS_SELECTION));        setSignificanceLevel(0.01);      } catch (Exception ex) {      }    } else {      try {        setFoldsType(new SelectedTag(TEN_FOLD,                                     XVALTAGS_SELECTION));        setSignificanceLevel(0.001);      } catch (Exception ex) {      }    }  }  /**   * Get the race type   *   * @return the type of race   */  public SelectedTag getRaceType() {    return new SelectedTag(m_raceType, TAGS_SELECTION);  }  /**   * Returns the tip text for this property   * @return tip text for this property suitable for   * displaying in the explorer/experimenter gui   */  public String significanceLevelTipText() {    return "Set the significance level to use for t-test comparisons.";  }  /**   * Sets the significance level to use   * @param sig the significance level
racesearch.java - 源码说明

本页面展示了「这是关于数据挖掘的一些算法」中的 racesearch.java 源码文件，采用 Java 编程语言编写，共 1,770 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与数据挖掘相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?