⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 c4_5.html

📁 由于ID3算法在实际应用中存在一些问题
💻 HTML
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD><TITLE>Manpage of C4.5</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<STYLE TYPES="text/css">DIV.section {
	MARGIN-LEFT: 2cm
}
</STYLE>
<LINK REL=StyleSheet HREF="../../../../stylesheet/main.css" TYPE="text/css">
<META content="MSHTML 6.00.2800.1276" name=GENERATOR></HEAD>
<BODY>
<blockquote>
<H1>C4.5</H1>
<HR>
<A name=lbAB>&nbsp;</A> 
<H2>NAME</H2>
<P>c4.5 - form a decision tree from a file of examples <A name=lbAC>&nbsp;</A> 
<H2>SYNOPSIS</H2>
<P><B>c4.5</B> [ <B>-f</B> filestem ] [ <B>-u</B> ] [ <B>-s</B> ] [ <B>-p</B> ] 
[ <B>-v</B> verb ] [ <B>-t</B> trials ] 
<BR>&nbsp;&nbsp;&nbsp;[&nbsp;<B>-w</B>&nbsp;wsize&nbsp;] [ <B>-i</B> incr ] [ 
<B>-g</B> ] [ <B>-m</B> minobjs ] [ <B>-c</B> cf ] <A name=lbAD>&nbsp;</A> 
<H2>DESCRIPTION</H2>
<P><I>C4.5</I> is a program for inducing classification rules in the form of 
decision trees from a set of given examples. 
<P>All files read and written by C4.5 are of the form <I>filestem.ext</I> where 
<I>filestem</I> is a file name stem that identifies the induction task and 
<I>ext</I> is an extension that defines the type of file. The program expects to 
find at least two files: a <B>names file</B> <I>filestem.names</I> defining 
class, attribute and attribute value names, and a <B>data file</B> 
<I>filestem.data</I> containing a set of objects, each of which is described by 
its values of each of the attributes and its class. 
<P>The program can generate trees in two ways. In <I>batch</I> mode (the 
default), the program generates a single tree using all the available data. In 
<I>iterative</I> mode, the program starts with a randomly-selected subset of the 
data (the <I>window),</I> generates a trial decision tree, adds some 
misclassified objects, and continues until the trial decision tree correctly 
classifies all objects not in the window or until it appears that no progress is 
being made. Since iterative mode starts with a randomly-selected subset, 
multiple trials with the same data can be used to generate more than one tree. 
<P>All trees generated in the process are saved in <I>filestem.unpruned.</I> 
After each tree is generated, it is <I>pruned</I> in an attempt to simplify it. 
The `best' pruned tree (selected by the program if more there is more than one 
trial) is saved in machine-readable form in <I>filestem.tree.</I> 
<P>All trees produced, both pre- and post-simplification, are evaluated on the 
training data. If required, they can also be evaluated on unseen data in file 
<I>filestem.test.</I> 
<P><A name=lbAE></A> 
<H2>FILE FORMATS</H2>The <B>names file</B> <I>filestem.names</I> is a series of 
entries defining names of attributes, attribute values and classes. The file is 
free-format with the exception that the vertical bar `|' causes the remainder of 
that line to be ignored. Each entry is terminated by a period which may be 
omitted if it is the last character of a line. 
<P>The file commences with the names of the classes, separated by commas and 
terminated with a period. Each name consists of a string of characters that does 
not include comma, question mark or colon (unless preceded by a backslash). A 
period may be embedded in a name provided it is not followed by a space. 
Embedded spaces are also permitted but multiple whitespace is replaced by a 
single space. The rest of the file consists of a single entry for each 
attribute. An attribute entry begins with the attribute name followed by a 
colon, and then either the word `ignore' (indicating that this attribute should 
not be used), the word `continuous' (indicating that the attribute has real 
values), the word `discrete' followed by an integer <I>n</I> (indicating that 
the program should assemble a list of up to <I>n</I> possible values), or a list 
of all possible discrete values separated by commas. (The latter form for 
discrete attributes is recommended as it enables input to be checked.) Each 
entry is terminated with a period (but see above). 
<P>The <B>data file</B> <I>filestem.data</I> contains one line per object. Each 
line contains the values of the attributes in order followed by the object's 
class, with all entries separated by commas. The rules for valid names in the 
<B>names file</B> also hold for the names in the <B>data file.</B> An unknown 
value of an attribute is indicated by a question mark `?'. If a <B>test file</B> 
<I>filestem.test</I> is used, it has the same format as the data file. 
<P><A name=lbAF></A> 
<H2>OPTIONS</H2>Options and their meanings are: 
<P>
<DL compact>
  <DT><B>-f</B><I>filestem</I> 
  <DD>Specify the filename stem (default <B>DF)</B> 
  <DT><B>-u</B> 
  <DD>Evaluate trees produced on unseen cases in file <I>filestem.test.</I> 
  <DT><B>-s</B> 
  <DD>Force `subsetting' of all tests based on discrete attributes with more 
  than two values. C4.5 will construct a test with a subset of values associated 
  with each branch. 
  <DT><B>-p</B> 
  <DD>Probabilistic thresholds used for continuous attributes (see Quinlan, 
  1987a). 
  <DT><B>-t</B><I>trials</I> 
  <DD>Set iterative mode with specified number of trials. 
  <DT><B>-v</B><I>verb</I> 
  <DD>Set the verbosity level [0-3] (default 0). This option generates more 
  voluminous output that may help to explain what the program is doing (but 
  don't count on it); see the manual entry for <I>verbose.</I> </DD></DL>
<P>The following options are also available but need not be used except for 
experimentation with tree construction: 
<DL compact>
  <DT><B>-w</B><I>wsize</I> 
  <DD>Set the size of the initial window (default is the maximum of 20 percent 
  and twice the square root of the number of data objects). 
  <DT><B>-i</B><I>incr</I> 
  <DD>Set the maximum number of objects that can be added to the window at each 
  iteration (default is 20 percent of the initial window size). 
  <DT><B>-g</B> 
  <DD>Use the gain criterion to select tests. The default uses the gain ratio 
  criterion. 
  <DT><B>-m</B><I>minobjs</I> 
  <DD>In all tests, at least two branches must contain a minimum number of 
  objects (default 2). This option allows the minimum number to be altered. 
  <DT><B>-c</B><I>cf</I> 
  <DD>Set the pruning confidence level (default 25%). </DD></DL><A 
name=lbAG>&nbsp;</A> 
<H2>FILES</H2>
<P>c4.5 <BR>filestem.data <BR>filestem.names <BR>filestem.unpruned (unpruned 
trees) <BR>filestem.tree (final decision tree) <BR>filestem.test (unseen data) 
<P><A name=lbAH></A> 
</blockquote>
</BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -