📄 compdm.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0059)http://www.ug.cs.usyd.edu.au/~abright/comp5318homework.html -->
<HTML><HEAD><TITLE>COMP Data Mining & Machine Learning Assignment</TITLE>
<META http-equiv=Content-Type content="text/html; charset=big5">
<META content="MSHTML 6.00.2800.1528" name=GENERATOR>
<META content="" name=Author>
<META content="" name=Keywords>
<META content="" name=Description></HEAD>
<BODY>
<TABLE width="70%" align=center>
<TBODY>
<TR>
<TD>
<H1>
<CENTER>Association Rule Mining - Apriori Algorithm
<P>Adrian Bright 0120990</CENTER></H1><STRONG><A
href="mailto:abright@it.usyd.edu.au">abright@it.usyd.edu.au</A></A></STRONG>
<P><STRONG>Overview:</STRONG>
<UL>
<LI>This program performs the first step of association rule mining
according to the Apriori algorithm. Given a set of transactions, it
identifies the frequent itemsets.
<LI>First of all candidate 1-itemsets are generated, then the
transactions passed over these to produce the frequent 1-itemsets. The
candidate 2-itemsets are then produced from these by merging and
pruning, and used to populate the hash tree. The infrequent itemsets are
then removed from the tree. This process continue for k-itemsets until
the maximum itemset size is reached.
<LI>The algorithm deals with out-of-scale datasets by running through
the transaction file each time they are passed over candidate itemsets,
rather than loading all the transactions into memory.
<LI>A hash tree data structure is used to store the candidate itemsets.
At each node is a bucket of itemsets and a HashMap which points to its
child nodes. </LI></UL><STRONG>Files:</STRONG> <STRONG><A
href="http://www.ug.cs.usyd.edu.au/~abright/compassignment.zip">compassignment.zip</A></STRONG>
<P>
<DL>
<DT>ARM.java
<DD>This class deals with command line arguments and loads the Apriori
class.
<DT>Apriori.java
<DD>This class controls the various aspects of the program such as
candidate itemset generation and hashtree population.
<DT>HashTree.java
<DD>This class represents the data structure in which the candidate
itemsets are stored.
<DT>ItemRef.java
<DD>This class stores the identifiers of each item, eg. '1' may
represent "beer".
<DT>ItemSet.java
<DD>This class represents a single candidate itemset and its support.
<DT>ItemsInput.java
<DD>Loads items and transactions from text files and generates candidate
itemsets.
<DT>MyTreeNode.java
<DD>Represents a node in the HashTree.
<DT>StemDescription.java
<DD>Stores the definitions of a stem, ie. which keys hash into it.
<DT>Transaction.java
<DD>Represents a single transaction, containing a set of >=1 items.
<DT>items.txt
<DD>A very small item definition set of six items.
<DT>transactions.txt
<DD>A very small transaction set of 10 transactions.
</DD></DL><STRONG>Directions:</STRONG>
<OL>
<LI>javac *.java
<LI>java ARM [items_file] [transactions_file] [output_file]
[min_support_percentage]<BR>eg. java items.txt transactions.txt
output.txt 30 </LI></OL><STRONG>Input:</STRONG>
<UL>
<LI>Items file must be in format:
<P><I>item_id item_name</I>
<P></P>
<LI>Transactions file must be in format:
<P><I>transaction_id item_id_1 item_id_2 item_id_3<BR></I>
<P>OR
<P><I>transaction_id item_id_1<BR>transaction_id
item_id_2<BR>transaction_id item_id_3</I>
</P></LI></UL><STRONG>Output:</STRONG>
<UL>
<LI>Output is in format:
<P><I>Itemset: {frequent_itemset}, support: {support_percentage}.</I>
</P></LI></UL></TD></TR></TBODY></TABLE></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -