📄 apriori.html
字号:
of a rule, in the head (consequent), or in both. A description of theformat of this additional input file, including examples, can be found<a href="#appearin">here</a>. If no item appearances file is given, therule selection is not restricted. (I am grateful to the people atIntegral Solutions Ltd., who developed the well-known data mining tool<a href="http://www.spss.com/Clementine/">Clementine</a>, but are nowpart of <a href="http://www.spss.com">SPSS</a>, for requesting thepossibility to restrict item appearances.)</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td> <td width=5></td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="select">Extended Item Set Selection</a></h3><p>Since version 4.20 there are extended selection possibilities forfrequent item sets, too. (These were added due to a coopertion withSonja Gruen, FU Berlin.)</p><!-- =============================================================== --><h4><a name="logquot">Binary Logarithm of Support Quotient</a></h4><p>An expected value for the support of an item set is computed fromthe support values of the individual items, assuming independence.Then the binary logarithm of the quotient of actual support andexpected support is computed. A minimum value for this measure canbe set with the option <tt>-d</tt>. In this case only frequent itemsets for which this measure exceeds the given threshold are kept.</p><!-- =============================================================== --><h4><a name="suppquot">Difference of Support Quotient to 1</a></h4><p>As with the preceding measure the quotient of actual and expectedsupport of an item set is computed and compared to 1 (a value of 1signifies independence of the items). A minimum value for this measurecan be set with the option <tt>-d</tt>. In this case only frequent itemsets for which this measure exceeds the given threshold are kept.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td> <td width=5></td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="tatree">Transaction Prefix Tree</a></h3><p>The counting process can be sped up by organizing the transactionsinto a prefix tree. That is, the items in each transaction are sortedand then transactions with the same prefix are grouped together andare counted, as one may say, in parallel. This way of organizing thetransactions was added in version 4.03 and is the default behavior now.If you prefer that the transactions are treated individually (i.e., thetransactions are stored in a simple list and only one transaction iscounted at a time), use the option <tt>-h</tt>.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td> <td width=5></td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="options">Program Invocation and Options</a></h3><p>My apriori program is invoked as follows:</p><p><tt>apriori [options] infile outfile [appfile]</tt></p><p>The normal arguments are:</p><table border=0 cellpadding=0 cellspacing=0><tr><td>infile</td><td width=10></td> <td>file to read transactions from</td></tr><tr><td>outfile</td><td></td> <td>file to write association rules / hyperedges to</td></tr><tr><td>appfile</td><td></td> <td>file stating item appearances (optional)</td></tr></table><p>The possible options are:</p><table border=0 cellpadding=0 cellspacing=0><tr><td><tt>-t#</tt></td><td width=10></td> <td>target type (default: association rules)</td></tr><tr><td><tt></tt></td><td width=10></td> <td>(s: itemsets, c: closed itemsets, m: maximal itemsets,<br> <font color="white">(</font>r: association rules, h: association hyperedges)</td></tr><tr><td><tt>-m#</tt></td><td></td> <td>minimal number of items per set/rule/hyperedge (default: 1)</td></tr><tr><td><tt>-n#</tt></td><td></td> <td>maximal number of items per set/rule/hyperedge (default: 5)</td></tr><tr><td><tt>-s#</tt></td><td></td> <td>minimal support of a set/rule/hyperedge (default: 10%)</td></tr><tr><td><tt>-S#</tt></td><td></td> <td>minimal support of a set/rule/hyperedge (default: 100%)</td></tr><tr><td><tt>-c#</tt></td><td></td> <td>minimal confidence of a rule/hyperedge (default: 80%)</td></tr><tr><td><tt>-o</tt></td><td></td> <td>use original definition of the support of a rule (body & head)</td></tr><tr><td><tt>-k#</tt></td><td></td> <td>item separator for output (default: " ")</td></tr><tr><td><tt>-p#</tt></td><td></td> <td>output format for support/confidence (default: "%.1f%%")</td></tr><tr><td><tt>-x</tt></td><td></td> <td>extended support output (print both rule support types) </td></tr><tr><td><tt>-a</tt></td><td></td> <td>print absolute support (number of transactions)</td></tr><tr><td><tt>-y</tt></td><td></td> <td>print lift value (confidence divided by prior)</td></tr><tr><td><tt>-e#</tt></td><td></td> <td>additional rule evaluation measure (default: none)</td></tr><tr><td><tt>-!</tt></td><td></td> <td>print a list of additional rule evaluation measures</td></tr><tr><td><tt>-d#</tt></td><td></td> <td>minimal value of additional evaluation measure (default: 10%)</td></tr><tr><td><tt>-v</tt></td><td></td> <td>print value of additional rule evaluation measure</td></tr><tr><td><tt>-g</tt></td><td></td> <td>write output in scanable form (quote certain characters)</td></tr><tr><td><tt>-l</tt></td><td></td> <td>do not load transactions into memory (work on input file)</td></tr><tr><td><tt>-q#</tt></td><td></td> <td>sort items w.r.t. their frequency (default: 1)</td></tr><tr><td><tt></tt></td><td></td> <td>(1: ascending, -1: descending, 0: do not sort,</td></tr><tr><td><tt></tt></td><td></td> <td><font color="white">(</font>2: ascending, -2: descending w.r.t. transaction size sum)</td></tr><tr><td><tt>-u#</tt></td><td></td> <td>filter unused items from transactions (default: 0.5)</td></tr><tr><td><tt></tt></td><td></td> <td>(0: do not filter items w.r.t. usage in item sets,<br> <0: fraction of removed items for filtering,<br> >0: take execution times ratio into account)</td></tr><tr><td><tt>-h</tt></td><td></td> <td>do not organize transactions as a prefix tree</td></tr><tr><td><tt>-j</tt></td><td></td> <td>use quicksort to sort the transactions (default: heapsort) </td></tr><tr><td><tt>-z</tt></td><td></td> <td>minimize memory usage (default: maximize speed)</td></tr><tr><td><tt>-i#</tt></td><td></td> <td>ignore records starting with characters in the given string</td></tr><tr><td valign="top"><tt>-b/f/r#</tt></td><td></td> <td>blank characters, field and record separators</td></tr><tr><td><tt></tt></td><td></td> <td>(default: "<tt> \t\r</tt>", "<tt> \t</tt>", "<tt>\n</tt>") </td></tr></table><p>(<tt>#</tt> always means a number, a letter, or a string that specifies the parameter of the option.)</p><p>Note that the effect of the option <tt>-z</tt> can depend heavily on how the items are sorted (option <tt>-q</tt>). Highest savings in memory usually result if items are sorted with descending frequency (<tt>-q-1</tt>). However, this often worsens the processing time considerably.</p><p>A note on the option <tt>-j</tt>: Constructing the prefix tree for the transactions requires sorting the transactions. Since version 4.17 heap sort is the default sorting method for the transactions, because it turned out that in conjunction with the item sorting (and especially for artificial datasets like T10I4D100K) quicksort can lead to very bad processing times (almost worst case behavior, i.e., O(n<sup>2</sup>) run time for the sorting). However, sometimes this is not a problem and then quicksort is slightly faster, which can be activated with the option -j.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td> <td width=5></td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="input">Input Format</a></h3><h4><a name="transin">Format of the Transactions File</a></h4><p>A text file structured by field and record separators and blanks.Record separators, not surprisingly, separate records, usually lines,field separators fields (or columns), usually words. Blanks are usedto fill fields (columns), e.g. to align them. In the transactionsfile each record must contain one transaction, i.e. a list of itemidentifiers, which are separated by field separators. An empty recordis interpreted as an empty transaction.</p><p>Examples can be found in the directory <tt>apriori/ex</tt> in thesource package. Refer to the file <tt>apriori/ex/readme</tt>, whichexplains how to process the different example files in the directory<tt>apriori/ex</tt> in the source package.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td> <td width=5></td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><h4><a name="appearin">Format of the Item Appearances File</a></h4><p>A text file structured by field and record separators and blanks.(Note: For this file the same field and record separators and blanksare used as for the transactions file.)</p><p>The first record, which must have one field, contains the defaultappearance to be used with all items not mentioned in the appearancesfile. Other records state the appearance of specific items. The firstfield states the item, the second the appearance indicator. If noappearance indicator is given, the item will be ignored (i.e. mayappear neither in the body (antecedent) nor in the head (consequent)of a rule). Empty records are ignored.</p><p>The following appearance indicators are recognized:</p><ul type=circle><li>item may appear only in rule bodies (antecedents):<br> <tt>i in b body a ante antecedent</tt></li><li>item may appear only in rule heads (consequents):<br> <tt>o out h head c cons consequent</tt></li><li>item may appear in rule bodies (antecedents) or in rule heads (consequents):<br> <tt>io inout bh b&h ac a&c both</tt></li><li>item may appear neither in rule bodies (antecedents) nor in rule heads (consequents):<br> <tt>n neither none ign ignore -</tt></li></ul><p><b>Example 1:</b>Generate only rules with item "x" in the consequent.</p><p><tt>in<br> x out</tt></p><p><b>Example 2:</b>Item "x" may appear only in a rule head (consequent),item "y" only in a rule body (antecedent);appearance of all other items is not restricted.</p><p><tt>both<br> x head<br> y body</tt></p><p>Providing no item appearances file is equivalent to an item
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -