📄 file_format.tex

📁 一种新颖的SVM算法
💻 TEX
字号:
/* * File:        sample_list_file_format.txt * Purpose:     Description of the data file formats of the SVM * * Author:      Mark O. Stitson * Created:     00/00/97 * Updated:     20/11/97 * *  * Copyright (c) 1997  RHBNC London - All rights reserved * THIS IS PROPRIETARY SOURCE CODE of RHBNC London */Format for loading and saving the sample_list_c:The sample list is an ascii file only containing numbers. The firstfew numbers indicate the exact format followed by the data.The sample_list can load several formats but only saves one format.The first number (int) of the sample_list file always contains the number ofexamples in the file.The second number (int) of the sample_list file always contains thedimensionality of the input space.The third number (int) of the sample list file determines the formatof the file. If it is positive, this is the dimensionalityclassification of the examples (version -1), this has to be 1 in thisversion. If it is negative it indicates the version number of the fileformat.Version -1:The rest of the file simply consists of examples. First the inputvalues of an example then its classification.Say we have four examples in two dimensional input space and theclassification follows the function f(x_1,x_2)=2 x x_1 + x_2.The input file should look something like this:4211   1   31.5 3.4 6.41.2 0   2.40   3   3Version 0:The fourth number (int) indicates the dimensionality of the classification ofthe examples. (In this version this has to be 1.)The fifth number (0/1) indicates whether or not the data has beenpre-scaled. This is useful if other data should be scaled in the sameway this data has been scaled.The sixth number (0/1) indicates whether or not the classificationshave been scaled.The seventh number indicates the lower bound of the scaled data.The eighth number indicates the upper bound of the scaled data.Then follows a list of the thresholds used for scaling (double). Ithas as many elements as there are dimensions in input space plus thenumber of dimensions of the classification.Then follows a list of scaling factors (double). It has as manyelements as the previous list.For an exact explanation on how scale factors and threshold arecalculated see the section on scaling. Note that these scale factorsare the factors that have previously been applied to the data. Theywill not be applied to the data when loading.The rest of the file simply consists of examples. First the inputvalues of an example then its classification.Say we have four examples in two dimensional input space and theclassification follows the function f(x_1,x_2)=2 x x_1 + x_2.The data was scaled before being put into the list between  -1 and 1.The original data points are the same as in the version -1 example.The input file should look something like this:420110-11-0.75    -1.7       01.333333 0.58823529 10.333333 -0.4117647 31.5      1          6.41.2      -1         2.40        0.76470588 3Version 1:The fourth number (int) indicates the dimensionality of the classification ofthe examples. (In this version this has to be 1.)The fifth number (0/1) indicates whether or not the data hasindividual epsilon values per example.The sixth number (0/1) indicates whether or not the data has  beenpre-scaled. This is useful if other data shoudl be scaled in the sameway this data has been scaled.The seventh number (0/1) indicates whether or not the classificationshave been scaled.The eighth number indicates the lower bound of the scaled data.The ninth number indicates the upper bound of the scaled data.Then follows a list of the thresholds used for scaling (double). Ithas as many elements as there are dimensions in input space plus thenumber of dimensions of the classification.Then follows a list of scaling factors (double). It has as manyelements as the previous list.For an exact explanation on how scale factors and threshold arecalculated see the section on scaling. Note that these scale factorsare the factors that have previously been applied to the data. Theywill not be applied to the data when loading.The rest of the file simply consists of examples. First the inputvalues of an example then its classification.Say we have four examples in two dimensional input space and theclassification follows the function f(x_1,x_2)=2 x x_1 + x_2.The data was scaled before being put into the list between  -1 and 1.The original data points are the same as in the version -1 example.The input file should look something like this:42-11110000   0   01   1   11   1   3   0.11.5 3.4 6.4 0.21.2 0   2.4 0.10   3   3   0.2ScalingScaling has to be used when values become unmanagable for theoptimizer used in the SV Machine. Some values reduce the numericalaccuracy to such an extent that no solution can be found anymore.Scaling a set of numbers N works as follows:We are given the lower and upper bound (lb,ub) between which thescaling should occur.Find the maximum and minimum value in N: max(N), min(N)Calculate the scaling factor: s=(ub-lb)/(max(N)-min(N))Calculate the threshold: t=(lb/s)-min(N)Scale all samples x:x_s=(x+t)*sLoading and saving chunksChunks are very similar to sample lists except that they store thealpha values as well as the samples.The chunk is an ascii file only containing numbers. The firstfew numbers indicate the exact format followed by the data.The chunk can load several formats but only saves one format.The first number (int) of the chunk file always contains the number ofexamples in the file.The second number (int) of the chunk file always contains thedimensionality of the input space.The third number (int) of the sample list file determines the formatof the file. If it is positive, this is the dimensionalityclassification of the examples (version -1), this has to be 1 in thisversion. If it is negative it indicates the version number of the fileformat.Version -1:The fourth number (int) indicates the number of alpha values perexample.This is followed by an example as described in the sample listfollowed by the alpha values of that example and then the next exampleand so on.Version 0:This version additionally can handle multiple epsilons.The fourth number (int) indicates the dimensionality of the classification ofthe examples. (In this version this has to be 1.)The fifth number (int) indicates the number of alpha values per example. The sixth number (0/1) indicates whether or not the data hasindividual epsilon values per example.This is followed by an example as described in the sample listfollowed by the alpha values of that example and then the next exampleand so on.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -