📄 apriorit.html
字号:
2 4 (2)2 5 (1)2 6 (1)3 4 (1)</PRE></TD><TD><PRE> </PRE></TD><TD><PRE>4 5 (1)4 6 (1)1 3 4 (1)2 4 5 (1)2 4 6 (1)</PRE></TD></TABLE></CENTER><P>These can be presented in a T-tree of the form given in Figure 1 (note thereverse nature of the tree).The internal representation of this "reverse" T-tree founded on arrays ofT-tree nodes that can beconceptualised as shown in Figure 2. The storage required for each node(representing a frequent set) in the T-tree is then 12 Bytes:</P><OL><LI> Reference to T-tree node structure (4 Bytes)<LI> Support count field in T-tree node structure (4 Bytes)<LI> Reference to child array field in T-tree node structure (4 Bytes)</OL><P>Thus house keeping requirements are still 8 Bytes, however storage gainsare obtained because it is not necessary to explicitly store individualattribute labels (i.e. column numbers representing instantiated elements) asthese are implied by the indexing. Of course this approach must also requirestorage for "stubs" (4 Bytes) where nodes are missing (unsupported). Overallthe storage advantages for this technique is thus, in part, dependent on thenumber of missing combinations contained in the data set.</P><BR><CENTER><IMG SRC = "reverseTtree.gif" BORDER = 0 ALT = "REVERSE T-TREE"><P><B>Figure 1</B>: <I>Conceptual example of the T-tree data structure</I></P></CENTER><CENTER><IMG SRC = "t-treeStruct3.gif" BORDER = 0 ALT = "VERSION 3 OF T-TREE"><P><B>Figure 2</B>: <I>Internal representation of T-tree presented in Figure 1</I></P></CENTER><br><hr><br><a NAME = "aprioriT"><table BORDER-=0 WIDTH="100%" CELLPADDING=5 BGCOLOR="006699"><tr><td><h2><font color =" white">6. THE APRIOR-T ALGORITHM</font></h2></td></table><BR><P>The T-tree described above is built in an apriori manner, as first proposedby Agrawal and Srikant (1994), starting withthe one item sets and continuing until there are no more candidate N-itemsets.Thus, at a high level, a standard apriori algorithm is used.</P><PRE>K <-- 1nextlevelFlag=true;generate candidate K-itemsetsLoop count support values for candidate K-itemsets prune unsupported K-itemsets K <-- 2 generate candidate K2 itemsets from previous level if no K2 itemsets breakend Loop</PRE><P>In more detail the Apriori-T algorithm commences with a method<TT>createTotalSupportTree</TT> which is presentedin Table 1. The method commences by generating the top level of the T-tree(<TT>createTtreeTopLevel</TT>) and then generating the next level(<TT>generateLevel2</TT>) from the supported sets in level 1 --- remember thatif a 1-itemset is not supported none of its super sets will be supported.Once we have generated level 2 further levels can be generated(<TT>createTtreeLevelN</TT>).</P><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: createTotalSupportTreeArguments: noneReturn: noneFields: NA------------------------------createTtreeTopLevel()generateLevel2()createTtreeLevelN()------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 1</B>: <I>The</I> <TT>createTotalSupportTree</TT> <I>method</I></P></CENTER><P>The method to generate the top level of a T-tree is as presented in Table 2.Note that the method includes acall to a general T-tree utility method <TT>pruneLevelN</TT> described later.</P><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: createTtreeTopLevelArguments: noneReturn: noneFields: D number of attributes startTtreeRef start of T-tree dataArray 2D array holding input sets--------------------------------------Dimension and initialise top level of T-tree (length = D)Loop from i = 0 to i = number of records in dataArray Loop from j = 0 to j = number of attributes in dataArray[i] startTtreeRef[i][j]++ End loopEnd LooppruneLevelN(startTtreeRef,1)--------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 2</B>: <I>The</I> <TT>createTtreeTopLevel</TT> <I>method</I></P></CENTER><P>The <TT>generateLevel2</TT> methods loops through the top level of theT-tree creating new T-tree arrays where appropriate (i.e. where the immediateparent nodes is supported). The method is outlined in Table 3. Note that themethod includes a call to a general T-tree utility method <TT>generateNextLevel</TT>(also described later).</P><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: generateLevel2Arguments: noneReturn: noneFields: startTtreeRef start of T-tree nextlevelFlag set to true if next level exists--------------------------------------nextlevelFlag <-- falseloop from i = 0 to i = number of nodes in T-tree top level if startTtreeRef[i] != null generateNextLevel(startTtreeRef,i,{i})End Loop--------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 3</B>: <I>The</I> <TT>createTtreeTopLevel</TT> <I>method</I></P></CENTER><P>Once we have a top level T-tree and a set of candidate second levels(arrays) we can proceed with generating the rest of the T-tree using aniterative process --- the <TT>createTtreeLevelN</TT> method presented in Table4. The <TT>createTtreeLevelN</TT> method calls a number ofother methods <TT>addSupportToTtreeLevelN</TT>, <TT>pruneLevelN</TT> (alsocalled by the <TT>createTtreeTopLevel</TT> method) and<TT>generateLevelN</TT> which arepresented in Tables 5, 6 and 7 respectively.</P><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: createTtreeLevelNArguments: noneReturn: noneFields: startTtreeRef start of T-tree nextlevelFlag set to true if next level exists--------------------------------------K <-- 2while (nextlevelFlag) addSupportToTtreeLevelN(K) pruneLevelN(startTtreeRef,K) nextlevelFlag <-- false generateLevelN(startTtreeRef,K,{}) K <-- K+1End loop--------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 4</B>: <I>The</I> <TT>createTtreeLevelN</TT> <I>method</I></P></CENTER><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: addSupportToTtreeLevelNArguments: K the current levelReturn: noneFields: startTtreeRef start of T-tree dataArray 2D array holding input sets--------------------------------------Loop from i = 0 to i = number of records in dataArray length = number of attributes in dataArray[i] addSupportToTtreeFindLevel(startTtreeRef,K,length, dataArray[i]End loop--------------------------------------Method: addSupportToTtreeFindLevelArguments: linkref reference to current array in T-tree K level marker length length of array at current branch in t-tree record input data record under considerationReturn: noneFields: None--------------------------------------if (K=1) Loop from i = 0 to i = length if (linkref[record[i]] != null) increment linkref[record[i]].support by 1 End if End Loopelse Loop from i = K-1 to i = length if (linkref[record[i]] != null && linkref[record[i]].childRef != null) addSupportToTtreeFindLevel(linkref[record[i]].childRe, K-1,i,record) End if End loopend if else --------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 5</B>: <I>The</I> <TT>addSupportToTtreeLevelN</TT> <I>method</I>and its related <TT>addSupportToTtreeFindLevel</TT> method</P></CENTER><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: pruneLevelNArguments: linkref reference to current array in T-tree K level markerReturn: true if entire array prunedFields: minSupport the minimum support threshold--------------------------------------if (K=1) allUnsupported <-- true Loop from i = 1 to i = length of array if (linkref[i] != null) if (linkref[i].support < minSupport) linkref[i] <-- null else allUnsupported <-- false End if else End if return allUnsupported End Loopelse Loop from i = K to i = length of array if (linkref[i] != null) if (pruneLevelN(linkref[i].childRef,K-1) linkref[i].childRef <-- null End if End if End loopEnd if else --------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 6</B>: <I>The</I> <TT>pruneLevelN</TT></P></CENTER><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: generateLevelNArguments: linkref reference to current array in T-tree K level marker I the item set represented by the parent nodeReturn: NoneFields: None--------------------------------------if (K=1) Loop from i = 2 to i = length of array if (linkref[i] != null) generateNextLevel(linkref,i,I union i) End if End loopelse Loop from i = K to i = length of array if (linkref[i] != null && linkref[i].childRef != null) generateLevelN(linkref[i].childRef,K-1,I union i--------------------------------------Method: generateNextLevelArguments: linkref reference to current array in T-tree i index to parent node in vurrent array I the item set represented by the parent nodeReturn: NoneFields: nextLevelExists flafg set to true or false--------------------------------------linkref[i].childRef <-- array of empty t-tree nodes of length iLoop from j = 1 to j = i if (linkRef[j] != null) newI <-- I union j if (all newI true subsets are all supported in the T-tree sofar) linkRef[i].childRef[j] <-- new T-tree Node nextLevelExists <-- true else linkRef[i].childRef[j] <-- null End if else End ifEnd loop--------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 7</B>: <I>The</I> <TT>generateLevelN</TT> <I>method</I>and its related <TT>generateNextLevel</TT> method</P></CENTER><br><hr><br><A name = "conclusions"><table BORDER-=0 WIDTH="100%" CELLPADDING=5 BGCOLOR="006699"><tr><td><h2><font color =" white">7. CONCLUSSIONS</font></h2></td></table> <BR><P>The Apriori-T ARM algorithm described herehave been use successfully by the LUCS-KDD reseach group to contrast and compare a variety of ARM algorithms and techniques. The software is available forfree, however the author would appreciate appropriate acknowledgement. The following reference format for referring to the Apriori-T implementation available hereis suggested:</P><OL><LI>Coenen, F. (2004), <I>The LUCS-KDDApriori-T Association Rule Mining Algorrithm</I>,http://www.cxc.liv.ac.uk/~frans/KDD/Software/Apriori_T/aprioriT.html,Department of Computer Science, The University of Liverpool, UK.</OL><P>Should you discover any "bugs" or other problems within the software (or this documentation), do not hesitate to contact the author.</P><br><hr><br><table BORDER-=0 WIDTH="100%" CELLPADDING=5 BGCOLOR="006699"><tr><td><h2><font color =" white">REFERENCES</font></h2></td></table> <BR><OL><LI>Agrawal, R. and Srikant, R. (1994),<I>t algorithms for mining association rules</I>,In proceedings of 20th VLDB Conference, Morgan Kaufman, pp487-499.<LI>Coenen and Leng (2004).<I>Data Structures for Association Rule Mining: T-trees and P-trees</I>To appear in IEEE Transaction in Knowledge and Data Engineering. </OL><br><hr><br><p>Created and maintained by<a HREF=http://www.csc.liv.ac.uk/~frans/>Frans Coenen</a>.Last updated 17 March 2004</body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -