📄 readme

📁 这是一个用于数据挖掘的常用算法的模板库（数据挖掘的C++模板库for UNIX)
💻
📖 第 1 页 / 共 2 页
字号:
上一页 12
NOTE: In order to explain the example better, unlike the previous examples, here we are using ints as the pattern element type.The tree miner runs with strings as well.The above trees can be visualized as follows:1.      1       /\      2  3 2.      3       /\      2  1        /\       2  33.      2       /      14.     15.     3      /\     1  26.     27.      3       /      1     /    2   /  18.    1     /    3The command line and output is as follows:test$ ./tree_test -i ../data/TREE_int_toy.data -s 2 -I -O -pCommand line parameters given were -infile=../data/TREE_int_toy.dataminsup=2print flag=1induced flag=1ordered flag=13 level-1 patterns foundFREQUENT PATTERNS -2 -- Support: 63 -- Support: 51 -- Support: 72  ---  1 -- Support: 23  ---  2 -- Support: 23  ---  1 -- Support: 33  ---  1  ---  2 -- Support: 21  ---  2 -- Support: 31  ---  3 -- Support: 31  ---  2  <--  3 -- Support: 210 patterns foundTiming Data =================:Wall clock time to read db in:0.00113702TOTAL wall clock time taken:0.00346684Interpreting Tree Output:-------------------------In the above test, we ran the tree miner to mine induced and ordered trees from the above dataset.The minimum support was set to 2. 10 frequent patterns were found. The last pattern '1  ---  2  <--  3' is found in 2 of the above trees.This pattern can be visualized as:          1         /\        2  3The '---' between two nodes indicates a forward edge, whereas '<--' between 2 nodes indicates thatyou need to go one level towards the root to add the following node.GRAPH RUN:----------test$./graph_test -i ../data/Graph_str_toy2.data -s 2 -p3 frequent single-edged patterns foundFREQUENT PATTERNSCanonical code: 0 1 B 2 BSupport: 2Canonical code: 0 1 B 3 CSupport: 2Canonical code: 0 1 A 1 BSupport: 2Canonical code: 0 1 B 2 B1 2 B 3 CSupport: 2Canonical code: 0 1 A 1 B1 2 B 2 BSupport: 2Canonical code: 0 1 A 1 B1 2 B 2 B1 3 B 3 CSupport: 2Canonical code: 0 1 A 1 B1 2 B 3 CSupport: 2TIMING STATISTICSTime taken to read db in: 6.69956e-05secTime taken in isomorphism checks: 0.00236869secTime taken in VAT intersection: 0.000332355secTotal time taken: 0.006253sec7 total frequent graphs foundOutput help for Graph:---------------------In the above graph mining, we got 7 frequent graph, of which 3 of them are single-edge graph.Our library prints a graph by printing its minimal canonical code. This code lists the edgesof the graph in such an order that the code is minimal. Note that, minimal canonical code of a graph is always unique. DMTL representation of canonical code is identical to gspan graphmining algorithm. Each edge is the code is represented by a 5-tuple. For instance, canonical code of an edge(v1, v2) is: <dfs_id_v1, dfs_id_v2, label_v1, label_edge, label_v2>dfs_id_v1 : Depth-first order of the vertex v1 in the minimal canonical labelingdfs_id_v2 : Depth-first order of the vertex v2 in the minimal canonical labelinglabel_v1: Label of vertex v1label_v2: label of vertex v2label_ede: Label of the edge (v1, v2)For example, the following canonical code(left) represent the following graph (right)                                                   1              30 1 A 1 B	                           A -------------- B ---------- C1 2 B 3 C    --------------------->         \  			        / 1 3 B 3 C                                     \ _ _ _ _ _ _ _ _ _ _ __ /                                                           3MULTISET RUN:------------test$ cat ../data/MS_str_toy.data 1 1 8 AC CA AB BC CC CA AA AA2 2 6 CB BC CA AC CC CA3 3 4 CB BC CA AC4 4 3 CA AC CBtest$ ./multiset_test -i ../data/MS_str_toy.data -s 2 -pCommand line parameters given were -infile=../data/MS_str_toy.dataminsup=2print flag=1Size of length one patterns = 5AC CA -- Support: 4AC BC -- Support: 3AC CC -- Support: 2AC CB -- Support: 3AC CA CA -- Support: 2AC CA BC -- Support: 3AC CA CC -- Support: 2AC CA CB -- Support: 3AC CA CA BC -- Support: 2AC CA CA CC -- Support: 2AC CA CA BC CC -- Support: 2AC CA BC CC -- Support: 2AC CA BC CB -- Support: 2AC BC CC -- Support: 2AC BC CB -- Support: 2CA CA -- Support: 2CA BC -- Support: 3CA CC -- Support: 2CA CB -- Support: 3CA CA BC -- Support: 2CA CA CC -- Support: 2CA CA BC CC -- Support: 2CA BC CC -- Support: 2CA BC CB -- Support: 2BC CC -- Support: 2BC CB -- Support: 2AC -- Support: 4CA -- Support: 4BC -- Support: 3CC -- Support: 2CB -- Support: 3The following statistics are purely for performance measurementWall clock time taken to read db in:0.00028491TOTAL wall clock time taken:0.0030641631 patterns foundInterpreting Output for Multiset:--------------------------------The output of multiset mining is similar as that of itemset mining. The only difference being that itemscan reapeat in a single transaction. For example the pattern {CA, CA} has a support 2, i.e. {CA, CA} appeared in 2transaction of the above datasets given in file MS_str_toy.dataNOTES FOR DEVELOPERS:=====================DMTL is a generic pattern mining algorithm. It uses template arguments to dispatch right algorithmto mine a specific pattern. User need to specify through templates, what kind of pattern theylike to mine and what kind of mining approach they like to use.Each pattern in represented as a collection of relational properties, like directed, no_edges,undirected, cyclic, indegree_lte_one (indegree <= 1), outdegree_lte_one (outdegree <= 1), etc.A list of property is passed as template argument to instantiate a pattern. See the test filesin the test directory, to see how the properties were passed for common pattern mining. Note that,a proplist data structure is used that recursively define the list of properties for a pattern.To keep the code clean, we used #define to shorthand the long recursive definition of this pattern template arguments.  Similarly we defined the variation of mining algorithms by a list of mining properties. We specified these also through proplist and again used the#define directive. Available properties are listed in the file called properties.h in the src/common directory. Note that properties in a proplist data structure should be ordered according to a property hierarchy. The hierarchy is shown in a file called 'prop_hierarchy.eps' in the 'doc' directory.Please read the doxygen documentation to learn more about different templated class and how to instantiatethose. For any further question, please email to: zaki@cs.rpi.eduEXTENDING DMTL:===============The objective of DMTL is to employ the generic programming idea to develop a mining library that is highlyflexible to future extension. One obvious extension is to mine some other complex pattern.If your pattern can be defined by the properties that we already defined, it's  great. Otherwise, justadd new properties in properties.h file. Now any mining algorithm needs to have three major operation:1. Candidate generation2. Isomorphism checking3. Support CountingOur current library provides these algorithm for four different patterns. If your pattern can shareany of the above four, you can use them right away, otherwise provide code for that operation only.For a detail discussion, see our paper (lcsd05.pdf) in doc directory. For any further question, pleaseemail to the authors (emails are available in AUTHORS file).
上一页 12
💿 文件大小 3269 K
👤 上传用户 d_zhihua
📂 所属分类人工智能/神经网络
🏷️ 相关标签

#UNIX #for #数据挖掘 #模板
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -