📄 readme
字号:
===================================================================== ====== README ====== WEKA 3.4.4 7 March 2005 Java Programs for Machine Learning Copyright (C) 1998-2005 University of Waikato web: http://www.cs.waikato.ac.nz/~ml =====================================================================Contents:---------1. Using one of the graphical user interfaces in Weka2. The Weka data format (ARFF)3. Using Weka from the command line - Classifiers - Association rules - Filters4. Database access5. The Experiment package6. Tutorial7. Source code8. Credits9. Submission of code and bug reports10. Copyright----------------------------------------------------------------------1. Using one of the graphical user interfaces in Weka:------------------------------------------------------This assumes that the Weka archive that you have downloaded has beenextracted into a directory containing this README and that you haven'tused an automatic installer (e.g. the one provided for Windows).Weka 3.4 requires Java 1.4 or higher. Depending on your platform youmay be able to just double-click on the weka.jar icon to run thegraphical user interfaces for Weka. Otherwise, from a command-line(assuming you are in the directory containing weka.jar), typejava -jar weka.jaror if you are using Windows usejavaw -jar weka.jarThis will start a small graphical user interface (GUIChooser) fromwhich you can select the SimpleCLI interface or the more sophisticatedExplorer, Experimenter, and Knowledge Flow interfaces. SimpleCLI justacts like a simple command shell. The Explorer is currently the maininterface for data analysis using Weka. The Experimenter can be usedto compare the performance of different learning algorithms acrossvarious datasets. The Knowledge Flow provides a component-basedalternative to the Explorer interface.Example datasets that can be used with Weka are in the sub-directorycalled "data", which should be located in the same directory as thisREADME file.The Weka user interfaces provide extensive built-in help facilities(tool tips, etc.). Documentation for the Explorer can be found inExplorerGuide.pdf (also in the same directory as thisREADME).You can also start the GUIChooser from within weka.jar:java -classpath weka.jar:$CLASSPATH weka.gui.GUIChooseror if you are using Windows usejavaw -classpath weka.jar;$CLASSPATH weka.gui.GUIChooser----------------------------------------------------------------------2. The Weka data format (ARFF):-------------------------------Datasets for WEKA should be formatted according to the ARFFformat. (However, there are several converters included in WEKA thatcan convert other file formats to ARFF. The Weka Explorer will usethese automatically if it doesn't recognize a given file as an ARFFfile.) Examples of ARFF files can be found in the "data" subdirectory.What follows is a short description of the file format. A morecomplete description is available from the Weka web page.A dataset has to start with a declaration of its name:@relation namefollowed by a list of all the attributes in the dataset (including the class attribute). These declarations have the form@attribute attribute_name specificationIf an attribute is nominal, specification contains a list of the possible attribute values in curly brackets:@attribute nominal_attribute {first_value, second_value, third_value}If an attribute is numeric, specification is replaced by the keyword numeric: (Integer values are treated as real numbers in WEKA.)@attribute numeric_attribute numericIn addition to these two types of attributes, there also exists astring attribute type. This attribute provides the possibility tostore a comment or ID field for each of the instances in a dataset:@attribute string_attribute stringAfter the attribute declarations, the actual data is introduced by a @datatag, which is followed by a list of all the instances. The instances are listed in comma-separated format, with a question mark representing a missing value. Comments are lines starting with % and are ignored by Weka.----------------------------------------------------------------------3. Using Weka from the command line:------------------------------------If you want to use Weka from your standard command-line interface(e.g. bash under Linux):a) Set WEKAHOME to be the directory which contains this README.b) Add $WEKAHOME/weka.jar to your CLASSPATH environment variable.c) Bookmark $WEKAHOME/doc/packages.html in your web browser.Alternatively you can try using the SimpleCLI user interface availablefrom the GUI chooser discussed above.In the following, the names of files assume use of a unix command-linewith environment variables. For other command-lines (includingSimpleCLI) you should substitute the name of the directory whereweka.jar lives for $WEKAHOME. If your platform uses something othercharacter than / as the path separator, also make the appropriatesubstitutions.===========Classifiers===========Try:java weka.classifiers.trees.J48 -t $WEKAHOME/data/iris.arffThis prints out a decision tree classifier for the iris dataset and ten-fold cross-validation estimates of its performance. If youdon't pass any options to the classifier, WEKA will list all the available options. Try:java weka.classifiers.trees.J48The options are divided into "general" options that apply to mostclassification schemes in WEKA, and scheme-specific options that onlyapply to the current scheme---in this case J48. WEKA has a commoninterface to all classification methods. Any class that implements aclassifier can be used in the same way as J48 is used above. WEKAknows that a class implements a classifier if it extends theClassifier class in weka.classifiers. Almost all classes inweka.classifiers fall into this category. Try, for example:java weka.classifiers.bayes.NaiveBayes -t $WEKAHOME/data/labor.arffHere is a list of some of the classifiers currently implemented inweka.classifiers:a) Classifiers for categorical prediction:weka.classifiers.lazy.IBk: k-nearest neighbour learnerweka.classifiers.trees.J48: C4.5 decision trees weka.classifiers.rules.PART: rule learner weka.classifiers.bayes.NaiveBayes: naive Bayes with/without kernelsweka.classifiers.rules.OneR: Holte's OneRweka.classifiers.functions.SMO: support vector machinesweka.classifiers.functions.Logistic: logistic regressionweka.classifiers.meta.AdaBoostM1: AdaBoostweka.classifiers.meta.LogitBoost: logit boostweka.classifiers.trees.DecisionStump: decision stumps (for boosting)etc.b) Classifiers for numeric prediction:weka.classifiers.functions.LinearRegression: linear regressionweka.classifiers.trees.M5P: model treesweka.classifiers.rules.M5Rules: model rulesweka.classifiers.lazy.IBk: k-nearest neighbour learnerweka.classifiers.lazy.LWR: locally weighted regression=================Association rules=================Next to classification schemes, there is some other useful stuff in WEKA. Association rules, for example, can be extracted using the Apriori algorithm. Tryjava weka.associations.Apriori -t $WEKAHOME/data/weather.nominal.arff=======Filters=======There are also a number of tools that allow you to manipulate adataset. These tools are called filters in WEKA and can be foundin weka.filters.weka.filters.unsupervised.attribute.Discretize: discretizes numeric dataweka.filters.unsupervised.attribute.Remove: deletes/selects attributesetc.Try:java weka.filters.supervised.attribute.Discretize -i $WEKAHOME/data/iris.arff -c last----------------------------------------------------------------------4. Database access:-------------------In terms of database connectivity, you should be able to use anydatabase with a Java JDBC driver. When using classes that access adatabase (e.g. the Explorer), you will probably want to create aproperties file that specifies which JDBC drivers to use, where tofind the database, and specify a mapping for the data types. This fileshould reside in your home directory or the current directory and becalled "DatabaseUtils.props". An example is provided inweka/experiment (you need to expand wek.jar to be able to look a thisfile). Note that the settings in this file are used unless they areoveridden by settings in the DatabaseUtils.props file in your homedirectory or the current directory (in that order).----------------------------------------------------------------------5. The Experiment package:--------------------------There is support for running experiments that involve evaluatingclassifiers on repeated randomizations of datasets, over multipledatasets (you can do much more than this, besides). The classes forthis reside in the weka.experiment package. The basic architecture isthat a ResultProducer (which generates results on some randomizationof a dataset) sends results to a ResultListener (which is responsiblefor stating whether it already has the result, and otherwise storingresults).Example ResultListeners include:weka.experiment.CSVResultListener: outputs results ascomma-separated-value files.weka.experiment.InstancesResultListener: converts results into a setof Instances.weka.experiment.DatabaseResultListener: sends results to a databasevia JDBC. Example ResultProducers include:weka.experiment.RandomSplitResultProducer: train/test on a % splitweka.experiment.CrossValidationResultProducer: n-fold cross-validationweka.experiment.AveragingResultProducer: averages results from anotherResultPoducer weka.experiment.DatabaseResultProducer: acts as a cache for results,storing them in a database.The RandomSplitResultProducer and CrossValidatioResultProducer makeuse of a SplitEvaluator to obtain actual results for a particularsplit, provided are ClassifierSplitEvaluator (for nominalclassification) and RegressionSplitEvaluator (for numericclassification). Each of these uses a Classifier for actual resultsgeneration. So, you might have a DatabaseResultListener, that is sent results froman AveragingResultProducer, which produces averages over the n resultsproduced for each run of an n-fold CrossValidationResultProducer,which in turn is doing nominal classification through aClassifierSplitEvaluator, which uses OneR for prediction. Whew. Butyou can combine these things together to do pretty much whatever youwant. You might want to write a LearningRateResultProducer that splitsa dataset into increasing numbers of training instances.To run a simple experiment from the command line, try:java weka.experiment.Experiment -r -T datasets/UCI/iris.arff \ -D weka.experiment.InstancesResultListener \ -P weka.experiment.RandomSplitResultProducer -- \ -W weka.experiment.ClassifierSplitEvaluator -- \ -W weka.classifiers.rules.OneR(Try "java weka.experiment.Experiment -h" to find out what theseoptions mean)If you have your results as a set of instances, you can perform pairedt-tests using weka.experiment.PairedTTester (use the -h option to findout what options it needs).However, all this is much easier if you use the Experimenter GUI.----------------------------------------------------------------------6. Tutorial:------------A tutorial on how to use WEKA is in $WEKAHOME/Tutorial.pdf. However,not everything in WEKA is covered in the Tutorial, and the packagestructure has changed quite a bit. For a complete list you have tolook at the online documentation $WEKAHOME/doc/packages.html. Inparticular, Tutorial.pdf is a draft from the "Data Mining" book (seeour web page), and so only describes features in the stable 3.0release.----------------------------------------------------------------------7. Source code:---------------The source code for WEKA is in $WEKAHOME/weka-src.jar. To expand it, use the jar utility that's in every Java distribution.----------------------------------------------------------------------8. Credits:-----------Refer to the web page for a list of contributors:http://www.cs.waikato.ac.nz/~ml/weka/----------------------------------------------------------------------9. Call for code and bug reports:---------------------------------If you have implemented a learning scheme, filter, application,visualization tool, etc., using the WEKA classes, and you think it should be included in WEKA, send us the code, and we can potentiallyput it in the next WEKA distribution.If you find any bugs, send a fix to mlcontrib@cs.waikato.ac.nz.If that's too hard, just send a bug report to the wekalist mailing list.-----------------------------------------------------------------------10. Copyright:--------------WEKA is distributed under the GNU public license. Please readthe file COPYING.-----------------------------------------------------------------------
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -