⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 readme

📁 charm是基于垂直数据集挖掘关联规则的一个著名算法
💻
字号:
1) Generate a data file using the IBM data generator program, gen.    OR  Start with an ascii file (see chess.ascii example file in this directory)  The format of the ascii/binary file should be<cid> <tid> <numitem> <item list>        CID TID #ITEMS LIST_OF_ITEMSe.g.    1   1   4       0 1 4 6        2   2   3       4 7 9  2) If ascii file, first convert to binary using makebin        makebin chess.ascii chess.dataBinary file MUST have .data extension   3) Get configuration by running getconf (gen automatically generates conf file, so this step can be skipped)          getconf -i chess -o chess -aBefore running the rest you should now have the following files        chess.data        chess.conf4)  run: exttpose -i XXX -o XXX -l -s LMINSUP -a 0        example: exttpose -i chess -o chess -l -s 0.2 -a 0              or  exttpose -i chess -o chess -l -s 0 -a 0                         (this allows any minsup to be used later)note: this produces the files XXX.tpose, and XXX.idxThe XXX.tpose file is the DB in vertical format, andXXX.idx is an index file specifying where the tid-list for each itembegins.You can specify a value of LMINSUP to be the same as the one you will use torun charm below, in which case you will have to rerun exttpose each time youuse a new lower MINSUP. Alternatively, you can use a small value for LMINSUP,and it will continue to work for all values of MINSUP >= LMINSUP when yourun elcat.The time for inverting is stored in summary.out. The format is:TPOSE DB_FILENAME X NUMITEMS TOTAL_TIME(see note one TOTAL_TIME below)               You should now have the following files:        chess.data        chess.conf        chess.tpose        chess.idx3) run charm        charm -i XXX -e 1 -d -l -s <MINSUP>        other flags         -o output the patterns found                output format: itemset - sup (tidset)         -d uses diffsets instead of tidsets (from length 3 onwards)         -l uses diffsets for pass 2 as well                (this should NOT be used for sparse datasets, since tidset                 size of pass 2 is smaller than diffset size for                 sparse sets.)         -H 1 for exact closed sets (takes longer time)                the default is to mine a superset of the closed sets                the use of -H 1 eliminates any non-closed sets using a                    hashing techniqueTo run charm directly on horizontal DB         charm -i XXX -h -d -l -s <MINSUP>          -h option converts from a horizontal DB XXX.data to an             in-memory vertical DB. Thus this version should only be             used with small DB that can fit in memory. IF you need to             run large DB then I have other scripts that first create             a disk-based vertical DB and then charm runs on that             DB. I think the current version is sufficient for most             experiments, but if you will do performance tests for             large DB then please ask for the other scripts.          If you use -h option there is no need to run exttposeMINSUP is in fractions, i.e., specify 0.5 if you want 50% minsup or0.01 if you want 1% support. note that the summary of the run is stored in the summary.outfile. The format of this file is as follows:CHARM (other options) DB_FILENAME MINSUP NUMTRANS_IN_DB ACTUAL_SUPPORT      [ ITER_i |Ci| |Fi| timeForIter_i avg_tidset/diffset_size ]       [TOT total_cands tot_freq tot_elapsed_time]       NumberofIntersections XXX XXX XXX XXX       tottime maxiters user_time sys_timeNote3: -e 1 option is a flag indicating charm to compute the supportof 2-itemsets from scratch. The number 1 says there is only one DBpartition that will be inverted entirely in main memory. If theoriginal DB is large then this inversion will obviously take too muchtime. So in this case I recommend dividing the DB into chunks of sizeroughly 5MB (assuming there is 32MB available to the process). Theexttpose program is equiped to handle this case. If you specify a <-pNUMPART> flag to exttpose it will divide the DB into NUMPARTchunks. Now you can run charm with -e NUMPART option. You must do thisif the DB is large otherwise the timings for charm will beskewed. Generally, the more the partitions the better the running timefor charm. For example:        exttpose -i XXX -o XXX -l -a 0 -s LMINSUP -p 10        charm -i XXX -s MINSUP -e 10 In summary run        charm -i chess -e 1 -s 0.8 -o for the current project.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -