📄 conceptdriftsimulator.java.old
字号:
/* * YALE - Yet Another Learning Environment * Copyright (C) 2002, 2003 * Simon Fischer, Ralf Klinkenberg, Ingo Mierswa, * Katharina Morik, Oliver Ritthoff * Artificial Intelligence Unit * Computer Science Department * University of Dortmund * 44221 Dortmund, Germany * email: yale@ls8.cs.uni-dortmund.de * web: http://yale.cs.uni-dortmund.de/ * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 * USA. */package edu.udo.cs.yale.operator.time;import edu.udo.cs.yale.operator.parameter.*;import edu.udo.cs.yale.operator.OperatorException;import edu.udo.cs.yale.operator.FatalException;import edu.udo.cs.yale.MethodNotSupportedException;import edu.udo.cs.yale.tools.LogService;import edu.udo.cs.yale.tools.RandomGenerator;import edu.udo.cs.yale.tools.Ontology;import edu.udo.cs.yale.example.Attribute;import edu.udo.cs.yale.example.Example;import edu.udo.cs.yale.example.ExampleSet;import edu.udo.cs.yale.example.BatchedExampleSet;import edu.udo.cs.yale.example.ExampleReader;import edu.udo.cs.yale.operator.Value;import edu.udo.cs.yale.operator.Operator;import edu.udo.cs.yale.operator.OperatorChain;import edu.udo.cs.yale.operator.ValidationChain;import edu.udo.cs.yale.operator.IllegalInputException;import edu.udo.cs.yale.operator.IOContainer;import edu.udo.cs.yale.operator.IOObject;import edu.udo.cs.yale.operator.IODescription;import edu.udo.cs.yale.operator.learner.Model;import edu.udo.cs.yale.operator.performance.PerformanceVector;import edu.udo.cs.yale.operator.performance.RunVector;import edu.udo.cs.yale.operator.performance.SeriesVector;import edu.udo.cs.yale.operator.performance.PerformanceCriterion;import edu.udo.cs.yale.operator.performance.EstimatedPerformance;import java.util.*;import java.io.*;/** A <tt>ConceptDriftSimulator</tt> encapsulates a concept drift experiment * (like for example in [Klinkenberg/Joachims/2000]). * This operator simulates the interest of a user in examples over time. * Whithin the simulation, the examples are assumed to arrive in batches, i.e. * a fixed number of examples is assumed to arrive at each single point in time. * The examples (e.g. text documents) may originate from several data streams * (e.g. news topics) and the probability of an example to be relevant depends * on the stream the example comes from and on the current point in time (batch). * The operator requires as input an example set and returns as its output a * <tt>PerformanceVector</tt> of the performance results obtained by the enclosed * classification learner/applier chains averaged over all batches and all runs * and a <tt>RunVector</tt> of the result for each batch averaged over all runs. * The latter allows to observe the performance of the enclosed learner/applier * chains over time. * The number of runs for averaging can be specified is comparable to the number * of folds (<tt>number_of_validations</tt>) in a cross-validation experiment * (@see edu.udo.cs.yale.operator.XValidation). * * The first inner operator (or operator chain) must be a classification learner * and the second a corresponding classification model applier and evaluator chain, * usually consisting of a model applier and a performance evaluator, which returns * a performance vector. * * <h4>Parameter operators enclosed in ConceptDriftSimulator:</h4> * The parametrization of this operator in the Yale configuration file for the * experiment has to contain <i>two operators</i> of the following types: * <ol> * <li>First a classification learning chain that delivers a learned model * (<tt>Model</tt>);</li> * <li>Second a classification applier chain able to use this model to predict * the labels of new examples and evaluate them. This chain delivers a * <tt>PerformanceVector</tt> (describing the performance of the classification * model in one run on one batch).</li> * </ol> * * <h4>Parameters:</h4> * <ul> * <li><b>number_of_runs</b>: specifies how often the concept drift simulation * should be repeated for computing the average results * (similar to number of folds (<tt>number_of_validations</tt>) * in cross-validation (<tt>XValidation</tt>)).</li> * <li><b>number_of_batches</b>: specifies the number of time steps to be simulated; * the size of a batch, i.e. the number of examples in a * batch is the total number of examples divided by the * number of batches.</li> * <li><b>number_of_streams</b>: specifies the number of data streams the examples come * from.</li> * <li><b>data_stream_names</b>: specifies the names of data streams the examples come * from (i.e. the possible values of the class label attribute).</li> * <li><b>data_stream_relevance</b>: specifies the probability for examples to be relevant * to the simulated user interest depending on the data * stream they come frome and the current batch; all * probabilities not explicitly specified are considered * to be <tt>0.0</tt>. * For the value of this parameter to be parsed correctly, * each line of this value should contain the specification * for exactly one stream. Each such line should start with * the stream name followed by ":" and the probability values * for the examples from that stream separated by whitespace. * </li> * <li><b>learner_type</b>: type of the enclosed learner: * <ul> * <li><i>static</i>: static learner to be used on all old data (= full memory approach).</li> * <li><i>static_window</i>: static learner to be used on a fixe time window on the old data * (= no memory approach for window size 1, * or other fixed window size approach otherwise).</li> * <li><i>adaptive</i>: adaptive learner that maintains an adaptive time window * or example weighting by itself.</li> * </ul> * </li> * <li><b>window_size</b>: size of the fixed time window in number of batches; this parameter * is only considered, if the learner type is <tt>static_window</tt>, * and ignored otherwise. * </li> * </ul> * * <h4>Operator-Input</h4> * <ol> * <li><tt>ExampleSet</tt>: set of examples to be used for the concept drift simulation * experiment; the class label attribute of an examples must * contain the name of the data stream the example originates from.</li> * </ol> * <h4>Operator-Output</h4> * <ol> * <li><tt>PerformanceVector</tt> of averaged performance results (<tt>PerformanceCriterion</tt>), * averaged over all batches and runs.</li> * <li><tt>PerformanceRun</tt> containing one <tt>PerformanceVector</tt> for each batch with the * average performance on this batch, averaged over all runs.</li> * </ol> * * <h4>Values:</h4> * <ul> * <li><tt>performance</tt> returns the current performance criterion value</li> * <li><tt>variance</tt> returns the current performance criterion variance (or standard deviation)</li> * <li><tt>run</tt> returns the number of the current run</li> * </ul> * * <h4>Example configuration of this operator in an experiment chain (Yale configuration file in XML format):</h4> * <pre> * <operator name="GlobalExperimentChain" class="OperatorChain"> * <parameter key="logfile" value="Log.ConceptDrift.txt"/> * <parameter key="logverbosity" value="0"/> * <parameter key="resultfile" value="Result.ConceptDrift.txt"/> * <parameter key="temp_dir" value="./tmp"/> * <parameter key="keep_temp_files" value="all"/> * * <!-- Read document vectors --> * <operator name="TrecExampleSetSource" class="SparseFormatExampleSource"> * <parameter key="attribute_file" value="./data.sparse.values"/> * <parameter key="label_file" value="./data.sparse.labels"/> * <parameter key="dimension" value="1000"/> <!-- use only first 1000 attributes --> * </operator> * * <!-- Start concept drift simulation --> * <operator name="MyConceptDriftSimulation" class="ConceptDriftSimulator"> * <parameter key="number_of_runs" value="10"/> * <parameter key="number_of_batches" value="20"/> * <parameter key="number_of_streams" value="5"/> * <parameter key="data_stream_names" value="Topic1 Topic3 Topic4 Topic5 Topic6"/> * <parameter key="data_stream_relevance" * value="Topic1 : 1.0 1.0 0.0 0.0 * Topic3 : 0.0 0.0 1.0 1.0 * Topic5 : 0.0 0.5 0.5 0.0"/> * <parameter key="learner_type" value="static_window"/> <!-- use a fixed time window ... --> * <parameter key="window_size" value="3"/> <!-- ... of the fixed size 3 batches --> * * <!-- mySVM parameters for the following learning and application chains --> * <parameter key="pattern" value=""/> * <parameter key="type" value="dot"/> * <parameter key="C" value="1000"/> * <parameter key="epsilon" value="0.1"/> * <parameter key="verbosity" value="0"/> * <parameter key="sparse" value="true"/> * <parameter key="weighted_examples" value="true"/> * <parameter key="xi_alpha_estimation" value="true"/> * * <!-- Learning chain with time step model finder --> * <operator name="TimeWinLearner" class="OperatorChain"> * <operator name="Learner" class="SVMLearner" parentlookup="2"/> * <operator name="EstimationResultWriter" class="ResultWriter"/> * </operator> * * <!-- Application and evaluation chain (identical to static case, receives TimeExampleSet/Reader --> * <operator name="ConceptDriftApplierChain" class="OperatorChain"> * <operator name="Applier" class="SVMApplier" parentlookup="2"/> * <operator name="PerfEvaluator" class="PerformanceEvaluator"> * <parameter key="criteria_list" value="classification_error"/> * </operator> * <operator name="RunResultWriter" class="ResultWriter"/> * </operator> * * </operator> <!-- end of ConceptDriftSimulation --> * * <operator name="AverageResultWriter" class="ResultWriter"/> * * </operator> <!-- end of GlobalExperimentChain --> * </pre> * * * <p><i>Class name for operator instantiation in Yale configuration files:</i> <b>ConceptDriftSimulator</b></p> * * <p><i>Bibliography:</i><br> * <b>[Klinkenberg/Joachims/2000]</b> Ralf Klinkenberg and Thorsten Joachims. * <i>Detecting Concept Drift with Support Vector Machines</i>. * In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), * pages 487-494, Morgan Kaufmann, San Francisco, CA, USA, 2000.<br> * [<a href="http://www-ai.cs.uni-dortmund.de/DOKUMENTE/klinkenberg_joachims_2000a.ps.gz">Postscript (gz)</a>] * * [<a href="http://www-ai.cs.uni-dortmund.de/DOKUMENTE/klinkenberg_joachims_2000a.pdf.gz">[PDF (gz)</A>]<br> * </p> * * @see edu.udo.cs.yale.operator.XValidation * * @author Ralf Klinkenberg
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -