📄 apriori.html

📁 数据挖掘中的关联规则算法
💻 HTML
📖 第 1 页 / 共 5 页
字号:
of a rule, in the head (consequent), or in both. A description of theformat of this additional input file, including examples, can be found<a href="#appearin">here</a>. If no item appearances file is given, therule selection is not restricted. (I am grateful to the people atIntegral Solutions Ltd., who developed the well-known data mining tool<a href="http://www.spss.com/Clementine/">Clementine</a>, but are nowpart of <a href="http://www.spss.com">SPSS</a>, for requesting thepossibility to restrict item appearances.)</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="select">Extended Item Set Selection</a></h3><p>Since version 4.20 there are extended selection possibilities forfrequent item sets, too. (These were added due to a coopertion withSonja Gruen, FU Berlin.)</p><!-- =============================================================== --><h4><a name="logquot">Binary Logarithm of Support Quotient</a></h4><p>An expected value for the support of an item set is computed fromthe support values of the individual items, assuming independence.Then the binary logarithm of the quotient of actual support andexpected support is computed. A minimum value for this measure canbe set with the option <tt>-d</tt>. In this case only frequent itemsets for which this measure exceeds the given threshold are kept.</p><!-- =============================================================== --><h4><a name="suppquot">Difference of Support Quotient to 1</a></h4><p>As with the preceding measure the quotient of actual and expectedsupport of an item set is computed and compared to 1 (a value of 1signifies independence of the items). A minimum value for this measurecan be set with the option <tt>-d</tt>. In this case only frequent itemsets for which this measure exceeds the given threshold are kept.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="tatree">Transaction Prefix Tree</a></h3><p>The counting process can be sped up by organizing the transactionsinto a prefix tree. That is, the items in each transaction are sortedand then transactions with the same prefix are grouped together andare counted, as one may say, in parallel. This way of organizing thetransactions was added in version 4.03 and is the default behavior now.If you prefer that the transactions are treated individually (i.e., thetransactions are stored in a simple list and only one transaction iscounted at a time), use the option <tt>-h</tt>.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="options">Program Invocation and Options</a></h3><p>My apriori program is invoked as follows:</p><p><tt>apriori [options] infile outfile [appfile]</tt></p><p>The normal arguments are:</p><table border=0 cellpadding=0 cellspacing=0><tr><td>infile</td><td width=10></td>    <td>file to read transactions from</td></tr><tr><td>outfile</td><td></td>    <td>file to write association rules / hyperedges to</td></tr><tr><td>appfile</td><td></td>    <td>file stating item appearances (optional)</td></tr></table><p>The possible options are:</p><table border=0 cellpadding=0 cellspacing=0><tr><td><tt>-t#</tt></td><td width=10></td>    <td>target type (default: association rules)</td></tr><tr><td><tt></tt></td><td width=10></td>    <td>(s: itemsets, c: closed itemsets, m: maximal itemsets,<br>        <font color="white">(</font>r: association rules,        h: association hyperedges)</td></tr><tr><td><tt>-m#</tt></td><td></td>    <td>minimal number of items per set/rule/hyperedge        (default: 1)</td></tr><tr><td><tt>-n#</tt></td><td></td>    <td>maximal number of items per set/rule/hyperedge        (default: 5)</td></tr><tr><td><tt>-s#</tt></td><td></td>    <td>minimal support    of a     set/rule/hyperedge        (default: 10%)</td></tr><tr><td><tt>-S#</tt></td><td></td>    <td>minimal support    of a     set/rule/hyperedge        (default: 100%)</td></tr><tr><td><tt>-c#</tt></td><td></td>    <td>minimal confidence of a         rule/hyperedge        (default: 80%)</td></tr><tr><td><tt>-o</tt></td><td></td>    <td>use original definition of the support of a rule        (body & head)</td></tr><tr><td><tt>-k#</tt></td><td></td>    <td>item separator for output (default: " ")</td></tr><tr><td><tt>-p#</tt></td><td></td>    <td>output format for support/confidence (default: "%.1f%%")</td></tr><tr><td><tt>-x</tt></td><td></td>    <td>extended support output (print both rule support types)        </td></tr><tr><td><tt>-a</tt></td><td></td>    <td>print absolute support (number of transactions)</td></tr><tr><td><tt>-y</tt></td><td></td>    <td>print lift value (confidence divided by prior)</td></tr><tr><td><tt>-e#</tt></td><td></td>    <td>additional rule evaluation measure (default: none)</td></tr><tr><td><tt>-!</tt></td><td></td>    <td>print a list of additional rule evaluation measures</td></tr><tr><td><tt>-d#</tt></td><td></td>    <td>minimal value of additional evaluation measure        (default: 10%)</td></tr><tr><td><tt>-v</tt></td><td></td>    <td>print value of additional rule evaluation measure</td></tr><tr><td><tt>-g</tt></td><td></td>    <td>write output in scanable form        (quote certain characters)</td></tr><tr><td><tt>-l</tt></td><td></td>    <td>do not load transactions into memory        (work on input file)</td></tr><tr><td><tt>-q#</tt></td><td></td>    <td>sort items w.r.t. their frequency (default: 1)</td></tr><tr><td><tt></tt></td><td></td>    <td>(1: ascending, -1: descending, 0: do not sort,</td></tr><tr><td><tt></tt></td><td></td>    <td><font color="white">(</font>2: ascending, -2: descending        w.r.t. transaction size sum)</td></tr><tr><td><tt>-u#</tt></td><td></td>    <td>filter unused items from transactions (default: 0.5)</td></tr><tr><td><tt></tt></td><td></td>    <td>(0: do not filter items w.r.t. usage in item sets,<br>        &lt;0: fraction of removed items for filtering,<br>        &gt;0: take execution times ratio into account)</td></tr><tr><td><tt>-h</tt></td><td></td>    <td>do not organize transactions as a prefix tree</td></tr><tr><td><tt>-j</tt></td><td></td>    <td>use quicksort to sort the transactions (default: heapsort)        </td></tr><tr><td><tt>-z</tt></td><td></td>    <td>minimize memory usage (default: maximize speed)</td></tr><tr><td><tt>-i#</tt></td><td></td>    <td>ignore records starting with characters in the given        string</td></tr><tr><td valign="top"><tt>-b/f/r#</tt></td><td></td>    <td>blank characters, field and record separators</td></tr><tr><td><tt></tt></td><td></td>    <td>(default: "<tt> \t\r</tt>", "<tt> \t</tt>", "<tt>\n</tt>")        </td></tr></table><p>(<tt>#</tt> always means a number, a letter, or a string that   specifies the parameter of the option.)</p><p>Note that the effect of the option <tt>-z</tt> can depend heavily   on how the items are sorted (option <tt>-q</tt>). Highest savings   in memory usually result if items are sorted with descending   frequency (<tt>-q-1</tt>). However, this often worsens the   processing time considerably.</p><p>A note on the option <tt>-j</tt>: Constructing the prefix tree for   the transactions requires sorting the transactions. Since version   4.17 heap sort is the default sorting method for the transactions,   because it turned out that in conjunction with the item sorting   (and especially for artificial datasets like T10I4D100K) quicksort   can lead to very bad processing times (almost worst case behavior,   i.e., O(n<sup>2</sup>) run time for the sorting). However, sometimes   this is not a problem and then quicksort is slightly faster, which   can be activated with the option -j.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="input">Input Format</a></h3><h4><a name="transin">Format of the Transactions File</a></h4><p>A text file structured by field and record separators and blanks.Record separators, not surprisingly, separate records, usually lines,field separators fields (or columns), usually words. Blanks are usedto fill fields (columns), e.g. to align them. In the transactionsfile each record must contain one transaction, i.e. a list of itemidentifiers, which are separated by field separators. An empty recordis interpreted as an empty transaction.</p><p>Examples can be found in the directory <tt>apriori/ex</tt> in thesource package. Refer to the file <tt>apriori/ex/readme</tt>, whichexplains how to process the different example files in the directory<tt>apriori/ex</tt> in the source package.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td>    <td width=5></td>    <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><h4><a name="appearin">Format of the Item Appearances File</a></h4><p>A text file structured by field and record separators and blanks.(Note: For this file the same field and record separators and blanksare used as for the transactions file.)</p><p>The first record, which must have one field, contains the defaultappearance to be used with all items not mentioned in the appearancesfile. Other records state the appearance of specific items. The firstfield states the item, the second the appearance indicator. If noappearance indicator is given, the item will be ignored (i.e. mayappear neither in the body (antecedent) nor in the head (consequent)of a rule). Empty records are ignored.</p><p>The following appearance indicators are recognized:</p><ul type=circle><li>item may appear only in rule bodies (antecedents):<br>    <tt>i in b body a ante antecedent</tt></li><li>item may appear only in rule heads (consequents):<br>    <tt>o out h head c cons consequent</tt></li><li>item may appear in rule bodies (antecedents)    or in rule heads (consequents):<br>    <tt>io inout bh b&amp;h ac a&amp;c both</tt></li><li>item may appear neither in rule bodies (antecedents)    nor in rule heads (consequents):<br>    <tt>n neither none ign ignore -</tt></li></ul><p><b>Example 1:</b>Generate only rules with item "x" in the consequent.</p><p><tt>in<br>       x out</tt></p><p><b>Example 2:</b>Item "x" may appear only in a rule head (consequent),item "y" only in a rule body (antecedent);appearance of all other items is not restricted.</p><p><tt>both<br>       x head<br>       y body</tt></p><p>Providing no item appearances file is equivalent to an item
💿 文件大小 122 K
👤 上传用户 yuanata
📂 所属分类人工智能/神经网络
🏷️ 相关标签

#数据挖掘 #关联规则 #算法
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -