📄 apriorit.html

📁 apriori algorithm using datasets implementation
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
2 4 (2)2 5 (1)2 6 (1)3 4 (1)</PRE></TD><TD><PRE> </PRE></TD><TD><PRE>4 5 (1)4 6 (1)1 3 4 (1)2 4 5 (1)2 4 6 (1)</PRE></TD></TABLE></CENTER><P>These can be presented in a T-tree of the form given in Figure 1 (note thereverse nature of the tree).The internal representation of this "reverse" T-tree founded on arrays ofT-tree nodes that can beconceptualised as shown in Figure 2. The storage required for each node(representing a frequent set) in the T-tree is then 12 Bytes:</P><OL><LI> Reference to T-tree node structure (4 Bytes)<LI> Support count field in T-tree node structure (4 Bytes)<LI> Reference to child array field in T-tree node structure (4 Bytes)</OL><P>Thus house keeping requirements are still 8 Bytes, however storage gainsare obtained because it is not necessary  to explicitly store individualattribute labels (i.e. column numbers representing instantiated elements) asthese are implied by the indexing. Of course this approach must also requirestorage for "stubs" (4 Bytes) where nodes are missing (unsupported). Overallthe storage advantages for this technique is thus, in part, dependent on thenumber of missing combinations contained in the data set.</P><BR><CENTER><IMG	SRC = "reverseTtree.gif"	BORDER = 0	ALT = "REVERSE T-TREE"><P><B>Figure 1</B>: <I>Conceptual example of the T-tree data structure</I></P></CENTER><CENTER><IMG	SRC = "t-treeStruct3.gif"	BORDER = 0	ALT = "VERSION 3 OF T-TREE"><P><B>Figure 2</B>: <I>Internal representation of T-tree presented in Figure 1</I></P></CENTER><br><hr><br><a NAME = "aprioriT"><table BORDER-=0 WIDTH="100%" CELLPADDING=5 BGCOLOR="006699"><tr><td><h2><font color =" white">6. THE APRIOR-T ALGORITHM</font></h2></td></table><BR><P>The T-tree described above is built in an apriori manner, as first proposedby Agrawal and Srikant (1994), starting withthe one item sets and continuing until there are no more candidate N-itemsets.Thus, at a high level, a standard apriori algorithm is used.</P><PRE>K <-- 1nextlevelFlag=true;generate candidate K-itemsetsLoop    count support values for candidate K-itemsets    prune unsupported K-itemsets    K <-- 2    generate candidate K2 itemsets from previous level    if no K2 itemsets breakend Loop</PRE><P>In more detail the Apriori-T algorithm commences with a method<TT>createTotalSupportTree</TT> which is presentedin Table 1. The method commences by generating the top level of the T-tree(<TT>createTtreeTopLevel</TT>) and then generating the next level(<TT>generateLevel2</TT>) from the  supported sets in level 1 --- remember thatif a 1-itemset is not supported none of its super sets will be supported.Once we have generated level 2 further levels can be generated(<TT>createTtreeLevelN</TT>).</P><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: createTotalSupportTreeArguments: noneReturn: noneFields: NA------------------------------createTtreeTopLevel()generateLevel2()createTtreeLevelN()------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 1</B>: <I>The</I> <TT>createTotalSupportTree</TT> <I>method</I></P></CENTER><P>The method to generate the top level of a T-tree is as presented in Table 2.Note that the method includes acall to a general T-tree utility method <TT>pruneLevelN</TT> described later.</P><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: createTtreeTopLevelArguments: noneReturn: noneFields: D number of attributes        startTtreeRef start of T-tree	dataArray 2D array holding input sets--------------------------------------Dimension and initialise top level of T-tree (length = D)Loop from i = 0 to i = number of records in dataArray    Loop from j = 0 to j = number of attributes in dataArray[i]    	startTtreeRef[i][j]++    End loopEnd LooppruneLevelN(startTtreeRef,1)--------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 2</B>: <I>The</I> <TT>createTtreeTopLevel</TT> <I>method</I></P></CENTER><P>The <TT>generateLevel2</TT> methods loops through the top level of theT-tree creating new T-tree arrays where appropriate (i.e. where the immediateparent nodes is supported). The method is outlined in Table 3. Note that themethod includes a call to a general T-tree utility method <TT>generateNextLevel</TT>(also described later).</P><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: generateLevel2Arguments: noneReturn: noneFields: startTtreeRef start of T-tree	nextlevelFlag set to true if next level exists--------------------------------------nextlevelFlag <-- falseloop from i = 0 to i = number of nodes in T-tree top level    if startTtreeRef[i] != null    		generateNextLevel(startTtreeRef,i,{i})End Loop--------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 3</B>: <I>The</I> <TT>createTtreeTopLevel</TT> <I>method</I></P></CENTER><P>Once we have a top level T-tree and a set of candidate second levels(arrays) we can proceed with generating the rest of the T-tree using aniterative process --- the <TT>createTtreeLevelN</TT> method presented in Table4. The <TT>createTtreeLevelN</TT> method calls a number ofother methods <TT>addSupportToTtreeLevelN</TT>, <TT>pruneLevelN</TT> (alsocalled by the <TT>createTtreeTopLevel</TT> method) and<TT>generateLevelN</TT> which arepresented in Tables 5, 6 and 7 respectively.</P><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: createTtreeLevelNArguments: noneReturn: noneFields: startTtreeRef start of T-tree	nextlevelFlag set to true if next level exists--------------------------------------K <-- 2while (nextlevelFlag)    addSupportToTtreeLevelN(K)    pruneLevelN(startTtreeRef,K)    nextlevelFlag <-- false    generateLevelN(startTtreeRef,K,{})    K <-- K+1End loop--------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 4</B>: <I>The</I> <TT>createTtreeLevelN</TT> <I>method</I></P></CENTER><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: addSupportToTtreeLevelNArguments: K the current levelReturn: noneFields: startTtreeRef start of T-tree	dataArray 2D array holding input sets--------------------------------------Loop from i = 0 to i = number of records in dataArray    length = number of attributes in dataArray[i]    addSupportToTtreeFindLevel(startTtreeRef,K,length,    		dataArray[i]End loop--------------------------------------Method: addSupportToTtreeFindLevelArguments: linkref reference to current array in T-tree	K level marker	length length of array at current branch in t-tree	record input data record under considerationReturn: noneFields: None--------------------------------------if (K=1)    Loop from i = 0 to i = length    	if (linkref[record[i]] != null)		increment linkref[record[i]].support by 1	End if    End Loopelse    Loop from i = K-1 to i = length        if (linkref[record[i]] != null &&			linkref[record[i]].childRef != null)	    addSupportToTtreeFindLevel(linkref[record[i]].childRe,	 		K-1,i,record)	End if    End loopend if else   		   	--------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 5</B>: <I>The</I> <TT>addSupportToTtreeLevelN</TT> <I>method</I>and its related <TT>addSupportToTtreeFindLevel</TT> method</P></CENTER><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: pruneLevelNArguments: linkref reference to current array in T-tree	K level markerReturn: true if entire array prunedFields: minSupport the minimum support threshold--------------------------------------if (K=1)    allUnsupported <-- true    Loop from i = 1 to i = length of array    	if (linkref[i] != null)	    if (linkref[i].support < minSupport)	    		linkref[i] <-- null            else allUnsupported <-- false	    End if else	End if    return allUnsupported    End Loopelse    Loop from i = K to i = length of array        if (linkref[i] != null)	    if (pruneLevelN(linkref[i].childRef,K-1)	    	linkref[i].childRef <-- null	    End if	 End if    End loopEnd if else   		   	--------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 6</B>: <I>The</I> <TT>pruneLevelN</TT></P></CENTER><CENTER><TABLE BORDER=1 CELLPADDING=5><TR><TD><ALIGN=LEFT><PRE>Method: generateLevelNArguments: linkref reference to current array in T-tree	K level marker	I the item set represented by the parent nodeReturn: NoneFields: None--------------------------------------if (K=1)    Loop from i = 2 to i = length of array	if (linkref[i] != null)	    generateNextLevel(linkref,i,I union i)	End if    End loopelse    Loop from i = K to i = length of array	if (linkref[i] != null && linkref[i].childRef != null)	    generateLevelN(linkref[i].childRef,K-1,I union i--------------------------------------Method: generateNextLevelArguments: linkref reference to current array in T-tree	i index to parent node in vurrent array	I the item set represented by the parent nodeReturn: NoneFields: nextLevelExists flafg set to true or false--------------------------------------linkref[i].childRef <-- array of empty t-tree nodes of length iLoop from j = 1 to j = i     if (linkRef[j] != null)         newI <-- I union j	 if (all newI true subsets are all supported in the	 		T-tree sofar)	     linkRef[i].childRef[j] <-- new T-tree Node	     nextLevelExists <-- true	 else linkRef[i].childRef[j] <-- null	 End if else     End ifEnd loop--------------------------------------</PRE></ALIGN></TD></TABLE><P><B>Table 7</B>: <I>The</I> <TT>generateLevelN</TT> <I>method</I>and its related <TT>generateNextLevel</TT> method</P></CENTER><br><hr><br><A name = "conclusions"><table BORDER-=0 WIDTH="100%" CELLPADDING=5 BGCOLOR="006699"><tr><td><h2><font color =" white">7. CONCLUSSIONS</font></h2></td></table> <BR><P>The Apriori-T ARM algorithm described herehave been use successfully by the LUCS-KDD reseach group to contrast and compare a variety of ARM algorithms and techniques. The software is available forfree, however the author would appreciate appropriate acknowledgement. The following reference format for referring to the Apriori-T implementation available hereis suggested:</P><OL><LI>Coenen, F. (2004), <I>The LUCS-KDDApriori-T Association Rule Mining Algorrithm</I>,http://www.cxc.liv.ac.uk/~frans/KDD/Software/Apriori_T/aprioriT.html,Department of Computer Science, The University of Liverpool, UK.</OL><P>Should you discover any "bugs" or other problems within the software (or this documentation), do not hesitate to contact the author.</P><br><hr><br><table BORDER-=0 WIDTH="100%" CELLPADDING=5 BGCOLOR="006699"><tr><td><h2><font color =" white">REFERENCES</font></h2></td></table> <BR><OL><LI>Agrawal, R. and Srikant, R. (1994),<I>t algorithms for mining association rules</I>,In proceedings of 20th VLDB Conference, Morgan Kaufman, pp487-499.<LI>Coenen and Leng (2004).<I>Data Structures for Association Rule Mining: T-trees and P-trees</I>To appear in IEEE Transaction in Knowledge and Data Engineering. </OL><br><hr><br><p>Created and maintained by<a HREF=http://www.csc.liv.ac.uk/~frans/>Frans Coenen</a>.Last updated 17 March 2004</body></html>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -