📄 conceptdriftsimulator.java.old

📁 著名的开源仿真软件yale
💻 OLD
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页
/* *  YALE - Yet Another Learning Environment *  Copyright (C) 2002, 2003 *      Simon Fischer, Ralf Klinkenberg, Ingo Mierswa,  *          Katharina Morik, Oliver Ritthoff *      Artificial Intelligence Unit *      Computer Science Department *      University of Dortmund *      44221 Dortmund,  Germany *  email: yale@ls8.cs.uni-dortmund.de *  web:   http://yale.cs.uni-dortmund.de/ * *  This program is free software; you can redistribute it and/or *  modify it under the terms of the GNU General Public License as  *  published by the Free Software Foundation; either version 2 of the *  License, or (at your option) any later version.  * *  This program is distributed in the hope that it will be useful, but *  WITHOUT ANY WARRANTY; without even the implied warranty of *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU *  General Public License for more details. * *  You should have received a copy of the GNU General Public License *  along with this program; if not, write to the Free Software *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 *  USA. */package edu.udo.cs.yale.operator.time;import edu.udo.cs.yale.operator.parameter.*;import edu.udo.cs.yale.operator.OperatorException;import edu.udo.cs.yale.operator.FatalException;import edu.udo.cs.yale.MethodNotSupportedException;import edu.udo.cs.yale.tools.LogService;import edu.udo.cs.yale.tools.RandomGenerator;import edu.udo.cs.yale.tools.Ontology;import edu.udo.cs.yale.example.Attribute;import edu.udo.cs.yale.example.Example;import edu.udo.cs.yale.example.ExampleSet;import edu.udo.cs.yale.example.BatchedExampleSet;import edu.udo.cs.yale.example.ExampleReader;import edu.udo.cs.yale.operator.Value;import edu.udo.cs.yale.operator.Operator;import edu.udo.cs.yale.operator.OperatorChain;import edu.udo.cs.yale.operator.ValidationChain;import edu.udo.cs.yale.operator.IllegalInputException;import edu.udo.cs.yale.operator.IOContainer;import edu.udo.cs.yale.operator.IOObject;import edu.udo.cs.yale.operator.IODescription;import edu.udo.cs.yale.operator.learner.Model;import edu.udo.cs.yale.operator.performance.PerformanceVector;import edu.udo.cs.yale.operator.performance.RunVector;import edu.udo.cs.yale.operator.performance.SeriesVector;import edu.udo.cs.yale.operator.performance.PerformanceCriterion;import edu.udo.cs.yale.operator.performance.EstimatedPerformance;import java.util.*;import java.io.*;/** A <tt>ConceptDriftSimulator</tt> encapsulates a concept drift experiment *  (like for example in [Klinkenberg/Joachims/2000]). *  This operator simulates the interest of a user in examples over time. *  Whithin the simulation, the examples are assumed to arrive in batches, i.e. *  a fixed number of examples is assumed to arrive at each single point in time. *  The examples (e.g. text documents) may originate from several data streams *  (e.g. news topics) and the probability of an example to be relevant depends  *  on the stream the example comes from and on the current point in time (batch). *  The operator requires as input an example set and returns as its output a *  <tt>PerformanceVector</tt> of the performance results obtained by the enclosed *  classification learner/applier chains averaged over all batches and all runs *  and a <tt>RunVector</tt> of the result for each batch averaged over all runs. *  The latter allows to observe the performance of the enclosed learner/applier *  chains over time. *  The number of runs for averaging can be specified is comparable to the number *  of folds (<tt>number_of_validations</tt>) in a cross-validation experiment *  (@see edu.udo.cs.yale.operator.XValidation). * *  The first inner operator (or operator chain) must be a classification learner *  and the second a corresponding classification model applier and evaluator chain, *  usually consisting of a model applier and a performance evaluator, which returns *  a performance vector.  * *  <h4>Parameter operators enclosed in ConceptDriftSimulator:</h4> *  The parametrization of this operator in the Yale configuration file for the *  experiment has to contain <i>two operators</i> of the following types: *  <ol> *    <li>First a classification learning chain that delivers a learned model *        (<tt>Model</tt>);</li> *    <li>Second a classification applier chain able to use this model to predict  *        the labels of new examples and evaluate them. This chain delivers a *        <tt>PerformanceVector</tt> (describing the performance of the classification *        model in one run on one batch).</li> *  </ol> * *  <h4>Parameters:</h4> *  <ul> *    <li><b>number_of_runs</b>: specifies how often the concept drift simulation  *                               should be repeated for computing the average results *                               (similar to number of folds (<tt>number_of_validations</tt>) *                               in cross-validation (<tt>XValidation</tt>)).</li> *    <li><b>number_of_batches</b>: specifies the number of time steps to be simulated; *                                  the size of a batch, i.e. the number of examples in a *                                  batch is the total number of examples divided by the *                                  number of batches.</li> *    <li><b>number_of_streams</b>: specifies the number of data streams the examples come  *                                  from.</li> *    <li><b>data_stream_names</b>: specifies the names of data streams the examples come  *                                  from (i.e. the possible values of the class label attribute).</li> *    <li><b>data_stream_relevance</b>: specifies the probability for examples to be relevant *                                      to the simulated user interest depending on the data *                                      stream they come frome and the current batch; all *                                      probabilities not explicitly specified are considered *                                      to be <tt>0.0</tt>. *                                      For the value of this parameter to be parsed correctly, *                                      each line of this value should contain the specification *                                      for exactly one stream. Each such line should start with *                                      the stream name followed by ":" and the probability values *                                      for the examples from that stream separated by whitespace. *        </li> *    <li><b>learner_type</b>: type of the enclosed learner: *        <ul> *          <li><i>static</i>:        static learner to be used on all old data (= full memory approach).</li> *          <li><i>static_window</i>: static learner to be used on a fixe time window on the old data *                                    (= no memory approach for window size 1,  *                                    or other fixed window size approach otherwise).</li> *          <li><i>adaptive</i>:      adaptive learner that maintains an adaptive time window  *                                    or example weighting by itself.</li> *        </ul> *        </li> *    <li><b>window_size</b>: size of the fixed time window in number of batches; this parameter *                            is only considered, if the learner type is <tt>static_window</tt>, *                            and ignored otherwise. *        </li> *  </ul> * *  <h4>Operator-Input</h4> *  <ol> *    <li><tt>ExampleSet</tt>: set of examples to be used for the concept drift simulation *                             experiment; the class label attribute of an examples must *                             contain the name of the data stream the example originates from.</li> *  </ol> *  <h4>Operator-Output</h4> *  <ol> *    <li><tt>PerformanceVector</tt> of averaged performance results (<tt>PerformanceCriterion</tt>), *        averaged over all batches and runs.</li> *    <li><tt>PerformanceRun</tt> containing one <tt>PerformanceVector</tt> for each batch with the *        average performance on this batch, averaged over all runs.</li> *  </ol> * *  <h4>Values:</h4> *  <ul> *    <li><tt>performance</tt> returns the current performance criterion value</li> *    <li><tt>variance</tt> returns the current performance criterion variance (or standard deviation)</li> *    <li><tt>run</tt> returns the number of the current run</li> *  </ul> * *  <h4>Example configuration of this operator in an experiment chain (Yale configuration file in XML format):</h4> *  <pre> *  &lt;operator name="GlobalExperimentChain" class="OperatorChain"&gt; *    &lt;parameter key="logfile"         value="Log.ConceptDrift.txt"/&gt; *    <parameter key="logverbosity"    value="0"/> *    <parameter key="resultfile"      value="Result.ConceptDrift.txt"/> *    <parameter key="temp_dir"        value="./tmp"/> *    <parameter key="keep_temp_files" value="all"/> * *    <!-- Read document vectors --> *    <operator name="TrecExampleSetSource" class="SparseFormatExampleSource"> *      <parameter key="attribute_file" value="./data.sparse.values"/> *      <parameter key="label_file"     value="./data.sparse.labels"/> *      <parameter key="dimension"      value="1000"/>  <!-- use only first 1000 attributes --> *    </operator> * *    <!-- Start concept drift simulation --> *    <operator name="MyConceptDriftSimulation" class="ConceptDriftSimulator"> *      <parameter key="number_of_runs"        value="10"/> *      <parameter key="number_of_batches"     value="20"/> *      <parameter key="number_of_streams"     value="5"/> *      <parameter key="data_stream_names"     value="Topic1 Topic3 Topic4 Topic5 Topic6"/> *      <parameter key="data_stream_relevance"  *                 value="Topic1 : 1.0  1.0  0.0  0.0 *                        Topic3 : 0.0  0.0  1.0  1.0 *                        Topic5 : 0.0  0.5  0.5  0.0"/> *      <parameter key="learner_type" value="static_window"/>  <!-- use a fixed time window ...     --> *      <parameter key="window_size"  value="3"/>              <!-- ... of the fixed size 3 batches --> * *      <!-- mySVM parameters for the following learning and application chains  --> *      <parameter key="pattern"   value=""/> *      <parameter key="type"      value="dot"/> *      <parameter key="C"         value="1000"/> *      <parameter key="epsilon"   value="0.1"/> *      <parameter key="verbosity" value="0"/> *      <parameter key="sparse"              value="true"/> *      <parameter key="weighted_examples"   value="true"/> *      <parameter key="xi_alpha_estimation" value="true"/> * *      <!-- Learning chain with time step model finder --> *      <operator name="TimeWinLearner" class="OperatorChain"> *        <operator name="Learner" class="SVMLearner" parentlookup="2"/> *        <operator name="EstimationResultWriter" class="ResultWriter"/> *      </operator> * *      <!-- Application and evaluation chain (identical to static case, receives TimeExampleSet/Reader --> *      <operator name="ConceptDriftApplierChain" class="OperatorChain"> *        <operator name="Applier" class="SVMApplier" parentlookup="2"/> *        <operator name="PerfEvaluator" class="PerformanceEvaluator"> *          <parameter key="criteria_list" value="classification_error"/> *        </operator> *        <operator name="RunResultWriter" class="ResultWriter"/> *      </operator> * *    </operator>  <!-- end of ConceptDriftSimulation --> * *    <operator name="AverageResultWriter" class="ResultWriter"/> * *  &lt;/operator&gt;  &lt;!-- end of GlobalExperimentChain --&gt; *  </pre> * * *  <p><i>Class name for operator instantiation in Yale configuration files:</i> <b>ConceptDriftSimulator</b></p> * *  <p><i>Bibliography:</i><br> *    <b>[Klinkenberg/Joachims/2000]</b> Ralf Klinkenberg and Thorsten Joachims. *       <i>Detecting Concept Drift with Support Vector Machines</i>. *       In Proceedings of the Seventeenth International Conference on Machine Learning (ICML),  *       pages 487-494, Morgan Kaufmann, San Francisco, CA, USA, 2000.<br> *       [<a href="http://www-ai.cs.uni-dortmund.de/DOKUMENTE/klinkenberg_joachims_2000a.ps.gz">Postscript (gz)</a>] *       &nbsp; *       [<a href="http://www-ai.cs.uni-dortmund.de/DOKUMENTE/klinkenberg_joachims_2000a.pdf.gz">[PDF (gz)</A>]<br> *  </p> * *  @see edu.udo.cs.yale.operator.XValidation * *  @author  Ralf Klinkenberg
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -