📄 compdm.htm

📁 apriori演算法於JAVA環境下開發用於資料探勘分類產生規則
💻 HTM
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0059)http://www.ug.cs.usyd.edu.au/~abright/comp5318homework.html -->
<HTML><HEAD><TITLE>COMP Data Mining & Machine Learning Assignment</TITLE>
<META http-equiv=Content-Type content="text/html; charset=big5">
<META content="MSHTML 6.00.2800.1528" name=GENERATOR>
<META content="" name=Author>
<META content="" name=Keywords>
<META content="" name=Description></HEAD>
<BODY>
<TABLE width="70%" align=center>
  <TBODY>
  <TR>
    <TD>
      <H1>
      <CENTER>Association Rule Mining - Apriori Algorithm
      <P>Adrian Bright 0120990</CENTER></H1><STRONG><A 
      href="mailto:abright@it.usyd.edu.au">abright@it.usyd.edu.au</A></A></STRONG> 

      <P><STRONG>Overview:</STRONG> 
      <UL>
        <LI>This program performs the first step of association rule mining 
        according to the Apriori algorithm. Given a set of transactions, it 
        identifies the frequent itemsets. 
        <LI>First of all candidate 1-itemsets are generated, then the 
        transactions passed over these to produce the frequent 1-itemsets. The 
        candidate 2-itemsets are then produced from these by merging and 
        pruning, and used to populate the hash tree. The infrequent itemsets are 
        then removed from the tree. This process continue for k-itemsets until 
        the maximum itemset size is reached. 
        <LI>The algorithm deals with out-of-scale datasets by running through 
        the transaction file each time they are passed over candidate itemsets, 
        rather than loading all the transactions into memory. 
        <LI>A hash tree data structure is used to store the candidate itemsets. 
        At each node is a bucket of itemsets and a HashMap which points to its 
        child nodes. </LI></UL><STRONG>Files:</STRONG> <STRONG><A 
      href="http://www.ug.cs.usyd.edu.au/~abright/compassignment.zip">compassignment.zip</A></STRONG> 

      <P>
      <DL>
        <DT>ARM.java 
        <DD>This class deals with command line arguments and loads the Apriori 
        class. 
        <DT>Apriori.java 
        <DD>This class controls the various aspects of the program such as 
        candidate itemset generation and hashtree population. 
        <DT>HashTree.java 
        <DD>This class represents the data structure in which the candidate 
        itemsets are stored. 
        <DT>ItemRef.java 
        <DD>This class stores the identifiers of each item, eg. '1' may 
        represent "beer". 
        <DT>ItemSet.java 
        <DD>This class represents a single candidate itemset and its support. 
        <DT>ItemsInput.java 
        <DD>Loads items and transactions from text files and generates candidate 
        itemsets. 
        <DT>MyTreeNode.java 
        <DD>Represents a node in the HashTree. 
        <DT>StemDescription.java 
        <DD>Stores the definitions of a stem, ie. which keys hash into it. 
        <DT>Transaction.java 
        <DD>Represents a single transaction, containing a set of &gt;=1 items. 
        <DT>items.txt 
        <DD>A very small item definition set of six items. 
        <DT>transactions.txt 
        <DD>A very small transaction set of 10 transactions. 
      </DD></DL><STRONG>Directions:</STRONG> 
      <OL>
        <LI>javac *.java 
        <LI>java ARM [items_file] [transactions_file] [output_file] 
        [min_support_percentage]<BR>eg. java items.txt transactions.txt 
        output.txt 30 </LI></OL><STRONG>Input:</STRONG> 
      <UL>
        <LI>Items file must be in format:
        <P><I>item_id item_name</I> 
        <P></P>
        <LI>Transactions file must be in format:
        <P><I>transaction_id item_id_1 item_id_2 item_id_3<BR></I>
        <P>OR
        <P><I>transaction_id item_id_1<BR>transaction_id 
        item_id_2<BR>transaction_id item_id_3</I> 
      </P></LI></UL><STRONG>Output:</STRONG> 
      <UL>
        <LI>Output is in format:
        <P><I>Itemset: {frequent_itemset}, support: {support_percentage}.</I> 
        </P></LI></UL></TD></TR></TBODY></TABLE></BODY></HTML>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -