⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 c4.5.1

📁 c4.5的源码决策树最全面最经典的版本
💻 1
字号:
.EN.TH C4.5 1.SH NAME.PPc4.5 \- form a decision tree from a file of examples.SH SYNOPSIS.PP.B c4.5[ \fB-f\fR filestem ][ \fB-u\fR ][ \fB-s\fR ][ \fB-p\fR ][ \fB-v\fR verb ][ \fB-t\fR trials ]   [ \fB-w\fR wsize ][ \fB-i\fR incr ][ \fB-g\fR ][ \fB-m\fR minobjs ][ \fB-c\fR cf ].SH DESCRIPTION.PP.I C4.5is a program for inducing classification rules in the formof decision trees from a set of given examples..PPAll files read and written by C4.5 are of the form.I filestem.extwhere.I filestemis a file name stem that identifies the induction task and.I extis an extension that defines the type of file.The program expects to find at least two files: a.B names file.I filestem.namesdefining class, attribute and attribute value names, and a.B data file.I filestem.datacontaining a set of objects, each of which is described by itsvalues of each of the attributes and its class..PPThe program can generate treesin two ways.  In.I batchmode (the default), the program generates a single treeusing all the available data.In.I iterativemode,the program starts with a randomly-selected subset of thedata (the.I window),generates a trial decision tree, adds some misclassifiedobjects, and continues until the trial decision treecorrectly classifies all objects not in the window oruntil it appears that no progress is being made.Since iterative mode starts with a randomly-selected subset,multiple trials with the same data can be used to generatemore than one tree..PPAll trees generated in the process are saved in.I filestem.unpruned.After each tree is generated, it is.I prunedin an attempt to simplify it.The `best' pruned tree (selected by the program if more there ismore than one trial)is saved in machine-readable form in.I filestem.tree..PPAll trees produced, both pre- and post-simplification, are evaluatedon the training data.  If required, they can also be evaluatedon unseen data in file.I filestem.test..SH FILE FORMATSThe.B names file.I filestem.namesis a series of entries defining names of attributes,attribute values and classes.  The file is free-formatwith the exception that the vertical bar `|' causes theremainder of that line to be ignored.Each entry is terminated by a period which may beomitted if it is the last character of a line..PPThe filecommences with the names of the classes, separated bycommas and terminated with a period.  Each name consists ofa string of characters that does not include comma, question markor colon (unless preceded by a backslash).  A period may beembedded in a name provided it is not followed by a space.Embedded spaces are also permitted but multiple whitespace isreplaced by a single space.The rest of the file consists of a single entry for eachattribute.  An attribute entry begins with the attribute namefollowed by a colon, and then either the word `ignore' (indicatingthat this attribute should not be used), the word `continuous'(indicating that the attribute has real values),the word `discrete' followed by an integer.I n(indicating that the program should assemblea list of up to.I npossible values), or a listof all possible discrete values separated by commas.  (The latterform for discrete attributes is recommended as itenables input to be checked.)  Eachentry is terminated with a period (but see above)..PPThe.B data file.I filestem.datacontains one line per object.  Each line containsthe values of the attributes in order followed by theobject's class, with all entries separated by commas.The rules for valid names in the.B names filealso hold for the names in the.B data file.An unknown value of an attribute is indicated by aquestion mark `?'.If a .B test file.I filestem.testis used, it has the same format as the data file..SH OPTIONSOptions and their meanings are:.PP.TP 12.BI \-f filestem\^Specify the filename stem (default.B DF).TP.B \-uEvaluate trees produced on unseen cases in file .I filestem.test..TP.B \-sForce `subsetting' of all tests based on discrete attributeswith more than two values.  C4.5 will construct a test witha subset of values associated with each branch..TP.B \-pProbabilistic thresholds used for continuous attributes (see Quinlan, 1987a)..TP.BI \-t trials\^Set iterative mode with specified number of trials..TP.BI \-v verb\^Set the verbosity level [0-3] (default 0).This option generates more voluminous output that may help toexplain what the program is doing (but don't count on it);see the manual entry for.I verbose..PPThe following options are also available but need notbe used except for experimentation with tree construction:.TP 12.BI \-w wsize\^Set the size of the initial window(default is the maximum of 20 percent and twice the squareroot of the number of data objects)..TP.BI \-i incr\^Set the maximum number of objects that can beadded to the window at each iteration(default is 20 percent of the initial window size)..TP.B \-gUse the gain criterion to select tests.  The defaultuses the gain ratio criterion..TP.BI \-m minobjs\^In all tests, at least two branches must contain a minimum numberof objects (default 2).  This option allows the minimumnumber to be altered..TP.BI \-c cf\^Set the pruning confidence level (default 25%)..SH FILES.PP.in 8c4.5.brfilestem.data.brfilestem.names.brfilestem.unpruned  (unpruned trees).brfilestem.tree   (final decision tree).brfilestem.test   (unseen data).in 0.PP.SH SEE ALSO.PPconsult(1).PP.SH BUGS

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -