matlabarsenal.htm

来自「一款数据挖掘的软件」· HTM 代码 · 共 636 行 · 第 1/2 页
HTM
636 行
    <td> 
      <pre>0</pre>
    </td>
  </tr>
  <tr> 
    <td> 
      <pre>2</pre>
    </td>
    <td> 
      <pre>0.76</pre>
    </td>
    <td> 
      <pre>1</pre>
    </td>
    <td> 
      <pre>0</pre>
    </td>
  </tr>
  <tr> 
    <td> 
      <pre>3</pre>
    </td>
    <td> 
      <pre>0.60</pre>
    </td>
    <td> 
      <pre>1</pre>
    </td>
    <td> 
      <pre>1</pre>
    </td>
  </tr>
  <tr> 
    <td> 
      <pre>4</pre>
    </td>
    <td> 
      <pre>0.79</pre>
    </td>
    <td> 
      <pre>0</pre>
    </td>
    <td> 
      <pre>1</pre>
    </td>
  </tr>
  <tr> 
    <td> 
      <pre>5</pre>
    </td>
    <td> 
      <pre>0.52</pre>
    </td>
    <td> 
      <pre>1</pre>
    </td>
    <td> 
      <pre>0</pre>
    </td>
  </tr>
  <tr> 
    <td> 
      <pre>6</pre>
    </td>
    <td> 
      <pre>0.67</pre>
    </td>
    <td> 
      <pre>0</pre>
    </td>
    <td> 
      <pre>0</pre>
    </td>
  </tr>
</table>
<FONT 
color=#000000> 
<P>&nbsp;</P>
<P>The file $(input_file).result contains the overall prediction statistics for 
  the test set. The sample output format is,</P>
<pre>Processing Filename: demo/DataExample1.txt
   Classifier:kNN_classify -k 3
   Message: Cross Validation, Folder: 3, Classification, 
   Error = 0.234679, Precision = 0.375293, Recall = 0.337795, F1 = 0.354157, MAP    = 0.289376, MBAP = 0.209378,<i> </i></pre>
</FONT> 
<H2>Options and Classifiers<br>
</H2>
<p><font color="#000000">The basic grammar for MATLABArsenal's command is as follows. 
  Note that &quot;--&quot; is used to separate different parts of the input commands. 
  <br>
  Do not forget to add spaces before and after the &quot;--&quot; otherwise the 
  wrappers cannot parse the command correctly.</font></p>
<pre><font color="#000000"> test_classify('classify -t input_file [general_option] [-- EvaluationMethod [evaluation_options]] ...
				[-- ClassifierWrapper [param] ] -- BaseClassifier [param] ); </font></pre>
<p><font color="#000000">More details for the available options, classifiers and 
  their default values.</font></p>
<blockquote> 
  <p><font color="#000000">1. The general options</font></p>
  <blockquote> 
    <pre><font color="#000000">	-v   (def 1): preprocess.Vebosity, vebosity of messages<br>    -sf  (def 0): preprocess.Shuffled, shuffle the data or not. 0 for no shuffling<br>    -n   (def 1): preprocess.Normalization, normalize the data or not. 1 for normalizing<br>    -sh  (def -1): preprocess.ShotAvailable, shot information available or not. -1 for automatical detection<br>    -vs  (def 1): preprocess.ValidateByShot, <br>    -ds  (def 0): preprocess.DataSampling, do data sampling or not. 0 for none<br>    -dsr (def 0): preprocess.DataSamplingRate, data sampling rate <br>    -svd (def 0): preprocess.SVD, SVD dimension reduction. The parameter is number of reduced dimension<br>    -fld (def 0): preprocess.FLD, FLD dimension reduction. The parameter is number of reduced dimension<br>    -map (def 0): preprocess.ComputeMAP, report mean average precision<br>    -if  (def 0): preprocess.InputFormat, the input formats, either 0 or 1 <br>    -of  (def 0): preprocess.OutputFormat, the output formats, either 0 or 1 <br>    -pf  (def 0): preprocess.PredFormat, the prediction file formats, either 0 or 1 <br>    -chi (def 0): preprocess.ChiSquare, feature selection using chi-squared measure. <br>
    -t    (def ''): preprocess.input_file, the input file name<br>    -o    (def ''): preprocess.output_file, the output file name<br>    -p    (def ''): preprocess.pred_file, the prediction file name<br>    -oflag(def 'a'):preprocess.OutputFlag, output flag. 'a' for appending, 'w' for 'overwriting'<br>    -dir  (def ''): preprocess.WorkingDir, the working directory, which is $MATLABArsenalRoot<br>    -drf  (def ''): preprocess.DimReductionFile, the intermediate file for dimension reduction</font></pre>
  </blockquote>
  <p><font color="#000000">2. The evaluation methods. The default method is to 
    split the input file equally into training and testing sets. </font></p>
  <pre><font color="#000000"> 	train_test_validate(default method): split the input data into training set and testing set<br>		options: -t (def -2): The training-testing splitting boundary for the data set<br> 	
	cross_validate: cross validation<br>		options: -t (def 3): The folder for cross-validation<br> 	
	test_file_validate: use the input file as the training set, the additional file as testing set<br>		options: -t (def ''): The additional testing file<br> 	
	train_only: use the input file for training only<br>		options: -m (def ''): The output model file<br> </font><font color="#000000">	
	test_only: use the input file for testing only<br>		options: -m (def ''): The input model file<br></font></pre>
  <p>3<font color="#000000">. The multiclass classification wrappers. By default 
    no wrappers are applied. </font></p>
  <pre><font color="#000000"> 	train_test_simple(default method): no wrappers are applied<br> 	
	train_test_multiple_class: multi-class classification wrappers<br>		options: -CodeType (def 0): Coding schemes. 0 is one-against-all, 1 is pairwise coupling, 2 is ECOC-16
			 -LossFuncType (def 2): The type of loss functions. 0 is logist loss, 1 is exp loss, 2 is hinger loss 

 	train_test_multiple_label: multi-label classification wrappers<br> 	
	train_test_multiple_class_AL: multi-class classification wrappers with active learning<br>		options: -CodeType (def 0): Coding schemes. 0 is one-against-all, 1 is pairwise coupling, 2 is ECOC-16
			 -LossFuncType (def 2): The type of loss functions. 0 is logist loss, 1 is exp loss, 2 is hinger loss 
			 -ALIter (def 4): Iterations for active learning
			 -ALIncrSize (def 10): Incremental size per iteration<br></font></pre>
  <p>4<font color="#000000">. The classifier wrappers. By default no wrappers 
    are applied. </font></p>
  <pre><font color="#000000"> 	WekaClassify (para) -- (Additional classifier and options): WEKA classification
		options: -MultiClassWrapper (def -1): Multi-class wrapper for WEKA. 1 for activation and 0 for deactivation. -1 for automatcially select
 	
	MCActiveLearning: Active learning module<br>		options: -Iter (def 10): Iterations for active learning
				 -IncrSize (def 10): Incremental size per iteration

 	MCAdaBoostM1: AdaBoost.M1<br>		options: -Iter (def 10): Iterations for AdaBoost
				 -SampleRatio (def 1): The ratio of data to be resampled per iteration. 1 means 100% of the data is resampled
 	
	MCBagging: Bagging<br>		options: -Iter (def 10): Iterations for Bagging
				 -SampleRatio (def 1): The ratio of data to be resampled per iteration. 1 means 100% of the data is resampled

	MCDownSampling: Down sampling <br>		options: -PosNegRatio (def 0.5): The ratio of positive and negative data after sampling

	MCUpSampling: Up sampling <br>		options: -PosNegRatio (def 0.5): The ratio of positive and negative data after sampling

	MCHierarchyClassify (para) -- (Meta Classifer, para) [-- BaseClassifer]: Hierarchial classification, using the meta classifier on top
		options: -PosNegRatio (def 0.5): The ratio of positive and negative data after sampling
				 -SampleDevSet (def 0): Whether use a sampled development set to learn meta classifier or not. 0 is not.

	MCWithMultiFSet: Hierarchial classification on multiple groups of features. See example 5<br>		options: -Voting (def 0): Use sum rule or majority voting to combine. 0 is sum rule.<br>				 -Separator (def 0): Separators for multiple feature groups.</font></pre>
  <font color="#000000"><br>
  </font>5<font color="#000000">. The base classifiers. </font> 
  <pre><font color="#000000"> 	SVM_LIGHT: SVM_light classification
		options: -Kernel (def 0): Kernel Type. 0 for linear, 1 for polynomial, 2 for RBF, 3 for sigmoid
				 -KernelParam (def 0.05): Kernel Parameter. d for polynomial, g for RBF, a/b for sigmoid
				 -CostFactor (def 1): Cost Factor, roughly the ratio of positive and negative data 
				 -Threshold (def 0): Classification threshold. Classified as positive if larger than the threshold  

 	SVM_LIGHT_TRANSDUCTIVE: SVM_light transductive classification
		options: -Kernel (def 0): Kernel Type. 0 for linear, 1 for polynomial, 2 for RBF, 3 for sigmoid
				 -KernelParam (def 0.05): Kernel Parameter. d for polynomial, g for RBF, a/b for sigmoid
				 -CostFactor (def 1): Cost Factor, roughly the ratio of positive and negative data 
				 -Threshold (def 0): Classification threshold. Classified as positive if larger than the threshold  
				 -TransPosFrac (def 1): Transductive postive fraction
 	
 	libSVM: libSVM classification 
		options: -Kernel (def 0): Kernel Type. 0 for linear, 1 for polynomial, 2 for RBF, 3 for sigmoid
				 -KernelParam (def 0.05): Kernel Parameter. d for polynomial, g for RBF, a/b for sigmoid
				 -CostFactor (def 1): Cost Factor, roughly the ratio of positive and negative data 
				 -Threshold (def 0): Classification threshold. Classified as positive if larger than the threshold  

 	mySVM: mySVM classification
		options: -Config (def N/A): the configuration file

	kNN_classify: kNN classification<br>		options: -k (def 1): number of neighbors
				 -d (def 2): distnace type. 0 for Euclidean, 1 for chi-squared, 2 for cosine-similarity

 	GMM_classify: Gaussian Mixture Model classification
		options: -NumMix (def 1): number of mixture for each class
 	
	LDA_classify: Linear Discriminant Analysis classification<br>		options: -RegFactor (def 0.1): Regularization factors
				 -QDA (def 0): 0 for LDA, 1 for QDA

	IIS_classify: Maximum entopy model, IIS implementation<br>		options: -Iter (def 50): number of iterations
				 -MinDiff (def 1e-7): Minimum difference of loglikelihood 
				 -Sigma (def 0): Regularization factors

	NeuralNet: Multi-layer perceptron (N/A for binary mode)<br>		options: -NHidden (def 10): Hidden units
				 -NOut (def 1): Output units
				 -Alpha (def 0.2): Weight decay 
				 -NCycles (def 10): Number of training cycles

	LogitReg: Logistic regression
		options: -RegFactor (def 0): Regularization factors
				 -CostFactor (def 1): Cost factors

	LogitRegKernel: Kernel logistic regression<br>		options: -RegFactor (def 0): Regularization factors
				 -Kernel (def 0): Kernel Type. 0 for linear, 1 for polynomial, 2 for RBF, 3 for sigmoid
				 -KernelParam (def 0.05): Kernel Parameter. d for polynomial, g for RBF, a/b for sigmoid

	ZeroR: Do nothing, predict everthing as zero

	Wekaclassify -- trees.J48: C4.5 decision trees

	Wekaclassify -- bayes.NaiveBayes: Naive Bayes

	More weka classifiers, please refer to its manual
</font></pre>
</blockquote>
<H2>Getting started: some examples</H2>
<p> <b>Example 1</b></p>
<blockquote>
  <p>Classify DataExample1.txt <br>
    Shuffle the data before classfication ('-sf 1')<br>
    50%-50% train-test split (default)<br>
    Linear Kernel Support Vector Machine</p>
</blockquote>
<pre>test_classify('classify -t DataExample1.txt -sf 1 -- LibSVM -Kernel 0 -CostFactor 3');</pre>
<p><b>Example 2</b></p>
<blockquote>
  <p>Classify DataExample1.txt <br>
    Shuffle the data before classfication ('-sf 1')<br>
    Reduce the number of dimension to 15<br>
    3 folder Cross Validation <br>
    3 Nearest Negihbor</p>
</blockquote>
<pre>test_classify('classify -t DataExample1.txt -sf 1 -svd 15 -- cross_validate -t 3 -- kNN_classify -k 3');
</pre>
<p> <b>Example 3</b><br>
</p>
<blockquote>
  <p>Classify DataExample2.txt <br>
    Do not shuffle the data<br>
    Use first 100 data as training, the rest as testing <br>
    Apply a multi-class classification wrapper <br>
    RBF Kernel SVM_LIGHT Support Vector Machine</p>
</blockquote>
<pre>test_classify('classify -t DataExample2.txt -sf 0 -- train_test_validate -t 100 -- train_test_multiple_class -- SVM_LIGHT -Kernel 2 -KernelParam 0.01 -CostFactor 3');</pre>
<p> <b>Example 4</b></p>
<blockquote> 
  <p>Train with DataExample2.train.txt, Test with DataExample2.test.txt <br>
    Do not shuffle the data<br>
    Use Weka provided C4.5 Decision Trees<br>
    AdaBoostM1 Wrapper<br>
    No Multi-class Wrapper for Weka </p>
</blockquote>
<pre>test_classify(strcat('classify -t DataExample2.train.txt -sf 0 ', ...
   ' -- test_file_validate -t DataExample2.test.txt -- MCAdaBoostM1 -- WekaClassify    -NoWrapper -- trees.J48'));</pre>
<p><b>Example 5</b><br>
</p>
<blockquote>
  <p>Classify DataExample2.txt <br>
    Do not shuffle the data<br>
    Rewrite the output file <br>
    Use first 100 data as training, the rest as testing <br>
    Apply a stacking classification wrapper, first learn three classifiers based 
    on features (1..120), (121..150) and (154..225), majority voting on top <br>
    Improved Iterative Scaling with 50 iterations</p>
</blockquote>
<pre>test_classify(strcat('classify -t DataExample2.txt -sf 0 -of w', ...<br>' -- train_test_validate -t 100 -- MCWithMultiFSet -Voting -Separator 1,120,121,150,154,225    -- IIS_classify -Iter 50'));
</pre>
<p><b>Example 6</b></p>
<blockquote> 
  <p>Classify DataExample1.txt <br>
    Training the model using DataExample1.train.txt<br>
    Linear Kernel Support Vector Machine</p>
</blockquote>
<pre>test_classify(strcat('classify -t DataExample1.train.txt -- train_only -m DataExample1.libSVM.model -- LibSVM -Kernel 0 -CostFactor 3'));</pre>
<blockquote> 
  <p>Classify DataExample1.txt <br>
    Testing the new data for DataExample1.test.txt using DataExample1.libSVM.model<br>
    Linear Kernel Support Vector Machine</p>
</blockquote>
<pre>test_classify(strcat('classify -t DataExample1.test.txt -- test_only -m DataExample1.libSVM.model -- LibSVM -Kernel 0 -CostFactor 3'));</pre>
<p><b>Example 7</b></p>
<blockquote> 
  <p>Dimension Reduction of DataExample2.txt<br>
    Do not shuffle the data<br>
    Use Weka provided C4.5 Decision Trees<br>
    AdaBoostM1 Wrapper<br>
    No Multi-class Wrapper for Weka </p>
</blockquote>
<pre>test_classify('classify -t DataExample2.txt -sf 0 -svd 15 -drf DataExample2_SVD15.txt -- train_test_validate -t 1 -- ZeroR');
</pre>
<p><b>Example 8 (for binary code)</b></p>
<blockquote> 
  <p>The same as Example 1, assuming the current directory is $MATLABArsenalRoot<br>
    Classify DataExample1.txt <br>
    Shuffle the data before classfication ('-sf 1')<br>
    3 folder Cross Validation <br>
    Linear Kernel Support Vector Machine</p>
</blockquote>
<pre>./test_classify.exe 'classify -t demo/DataExample1.txt -sf 1 -- cross_validate -t 3 -- LibSVM -Kernel 0 -CostFactor 3'</pre>
<p><b>Example 9 (for binary code)</b></p>
<blockquote> 
  <p>The same as Example 1, assuming the current directory is $MATLABArsenalRoot/demo<br>
    Classify DataExample1.txt <br>
    Shuffle the data before classfication ('-sf 1')<br>
    3 folder Cross Validation <br>
    Linear Kernel Support Vector Machine</p>
</blockquote>
<pre>../test_classify.exe 'classify -dir .. -t DataExample1.txt -sf 1 -- cross_validate -t 3 -- LibSVM -Kernel 0 -CostFactor 3'</pre>
<pre>&nbsp;</pre>
<h2>Extensions and Additions</h2>
<UL>
  <FONT color=#000000></FONT>
</UL>
<FONT color=#000000> 
<H2>Questions and Bug Reports</H2>
<P> <FONT color=#000000>If you find bugs or you have problems with the code you 
  cannot solve by yourself, please contact me via email <a href="mailto:yanrong@cs.cmu.edu">yanrong@cs.cmu.edu</a>. 
  </font></P>
<H2>Disclaimer</H2>
<FONT color=#000000> 
<P>This software is free only for non-commercial use. It must not be modified 
  and distributed without prior permission of the author. The author is not responsible 
  for implications from the use of this software. </P>
</FONT> 
<h1>History</h1>
<h1>References</h1>
<P> <FONT color=#000000>Last modified May 3rd, 2004 by </font><A 
href="http://www.cs.cmu.edu/%7Eyanrong" target=_top>Rong Yan</A></P>
</FONT> 
</BODY></HTML>
matlabarsenal.htm - 源码说明

本页面展示了「一款数据挖掘的软件」中的 matlabarsenal.htm 源码文件，采用 HTM 编程语言编写，共 636 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与数据挖掘相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?