📄 c4_5.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Manpage of C4.5</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<STYLE TYPES="text/css">DIV.section {
MARGIN-LEFT: 2cm
}
</STYLE>
<LINK REL=StyleSheet HREF="../../../../stylesheet/main.css" TYPE="text/css">
<META content="MSHTML 6.00.2800.1276" name=GENERATOR></HEAD>
<BODY>
<blockquote>
<H1>C4.5</H1>
<HR>
<A name=lbAB> </A>
<H2>NAME</H2>
<P>c4.5 - form a decision tree from a file of examples <A name=lbAC> </A>
<H2>SYNOPSIS</H2>
<P><B>c4.5</B> [ <B>-f</B> filestem ] [ <B>-u</B> ] [ <B>-s</B> ] [ <B>-p</B> ]
[ <B>-v</B> verb ] [ <B>-t</B> trials ]
<BR> [ <B>-w</B> wsize ] [ <B>-i</B> incr ] [
<B>-g</B> ] [ <B>-m</B> minobjs ] [ <B>-c</B> cf ] <A name=lbAD> </A>
<H2>DESCRIPTION</H2>
<P><I>C4.5</I> is a program for inducing classification rules in the form of
decision trees from a set of given examples.
<P>All files read and written by C4.5 are of the form <I>filestem.ext</I> where
<I>filestem</I> is a file name stem that identifies the induction task and
<I>ext</I> is an extension that defines the type of file. The program expects to
find at least two files: a <B>names file</B> <I>filestem.names</I> defining
class, attribute and attribute value names, and a <B>data file</B>
<I>filestem.data</I> containing a set of objects, each of which is described by
its values of each of the attributes and its class.
<P>The program can generate trees in two ways. In <I>batch</I> mode (the
default), the program generates a single tree using all the available data. In
<I>iterative</I> mode, the program starts with a randomly-selected subset of the
data (the <I>window),</I> generates a trial decision tree, adds some
misclassified objects, and continues until the trial decision tree correctly
classifies all objects not in the window or until it appears that no progress is
being made. Since iterative mode starts with a randomly-selected subset,
multiple trials with the same data can be used to generate more than one tree.
<P>All trees generated in the process are saved in <I>filestem.unpruned.</I>
After each tree is generated, it is <I>pruned</I> in an attempt to simplify it.
The `best' pruned tree (selected by the program if more there is more than one
trial) is saved in machine-readable form in <I>filestem.tree.</I>
<P>All trees produced, both pre- and post-simplification, are evaluated on the
training data. If required, they can also be evaluated on unseen data in file
<I>filestem.test.</I>
<P><A name=lbAE></A>
<H2>FILE FORMATS</H2>The <B>names file</B> <I>filestem.names</I> is a series of
entries defining names of attributes, attribute values and classes. The file is
free-format with the exception that the vertical bar `|' causes the remainder of
that line to be ignored. Each entry is terminated by a period which may be
omitted if it is the last character of a line.
<P>The file commences with the names of the classes, separated by commas and
terminated with a period. Each name consists of a string of characters that does
not include comma, question mark or colon (unless preceded by a backslash). A
period may be embedded in a name provided it is not followed by a space.
Embedded spaces are also permitted but multiple whitespace is replaced by a
single space. The rest of the file consists of a single entry for each
attribute. An attribute entry begins with the attribute name followed by a
colon, and then either the word `ignore' (indicating that this attribute should
not be used), the word `continuous' (indicating that the attribute has real
values), the word `discrete' followed by an integer <I>n</I> (indicating that
the program should assemble a list of up to <I>n</I> possible values), or a list
of all possible discrete values separated by commas. (The latter form for
discrete attributes is recommended as it enables input to be checked.) Each
entry is terminated with a period (but see above).
<P>The <B>data file</B> <I>filestem.data</I> contains one line per object. Each
line contains the values of the attributes in order followed by the object's
class, with all entries separated by commas. The rules for valid names in the
<B>names file</B> also hold for the names in the <B>data file.</B> An unknown
value of an attribute is indicated by a question mark `?'. If a <B>test file</B>
<I>filestem.test</I> is used, it has the same format as the data file.
<P><A name=lbAF></A>
<H2>OPTIONS</H2>Options and their meanings are:
<P>
<DL compact>
<DT><B>-f</B><I>filestem</I>
<DD>Specify the filename stem (default <B>DF)</B>
<DT><B>-u</B>
<DD>Evaluate trees produced on unseen cases in file <I>filestem.test.</I>
<DT><B>-s</B>
<DD>Force `subsetting' of all tests based on discrete attributes with more
than two values. C4.5 will construct a test with a subset of values associated
with each branch.
<DT><B>-p</B>
<DD>Probabilistic thresholds used for continuous attributes (see Quinlan,
1987a).
<DT><B>-t</B><I>trials</I>
<DD>Set iterative mode with specified number of trials.
<DT><B>-v</B><I>verb</I>
<DD>Set the verbosity level [0-3] (default 0). This option generates more
voluminous output that may help to explain what the program is doing (but
don't count on it); see the manual entry for <I>verbose.</I> </DD></DL>
<P>The following options are also available but need not be used except for
experimentation with tree construction:
<DL compact>
<DT><B>-w</B><I>wsize</I>
<DD>Set the size of the initial window (default is the maximum of 20 percent
and twice the square root of the number of data objects).
<DT><B>-i</B><I>incr</I>
<DD>Set the maximum number of objects that can be added to the window at each
iteration (default is 20 percent of the initial window size).
<DT><B>-g</B>
<DD>Use the gain criterion to select tests. The default uses the gain ratio
criterion.
<DT><B>-m</B><I>minobjs</I>
<DD>In all tests, at least two branches must contain a minimum number of
objects (default 2). This option allows the minimum number to be altered.
<DT><B>-c</B><I>cf</I>
<DD>Set the pruning confidence level (default 25%). </DD></DL><A
name=lbAG> </A>
<H2>FILES</H2>
<P>c4.5 <BR>filestem.data <BR>filestem.names <BR>filestem.unpruned (unpruned
trees) <BR>filestem.tree (final decision tree) <BR>filestem.test (unseen data)
<P><A name=lbAH></A>
</blockquote>
</BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -