📄 readme
字号:
prefixspan --- An Implementation of PrefixSpan Author: Taku Kudo <taku-ku@is.aist-nara.ac.jp> Nara Institute of Science and Technology, Graduate School of Information Science, Computational Linguistics Laboratory License: GPL2 (Gnu General Public License Version 2) Reference: J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001. http://www.cs.sfu.ca/~peijian/personal/publications/span.pdf Requirements: C++ compiler with STL (Standard Template Library). Install: % make Usage: ./prefixspan [options] < data option: -m NUM: set minimum support (default: 1) -M NUM: set minimum pattern length (default: 1) -L NUM: set maximum pattern length (default: 0xffffffff) -t TYPE: set item type, choose from [string|int|short|char] (default: string) -a: print ALL patterns (default: no, print longest pattern only) -w: print the list of transaction IDs where the pattern occurs (default: no) -d STR use STR as delimiter between item and freq. (default: "/") -v: set verbose mode (print the size of transactions first) Format of input data: foo bar do foo foo bar i you he she me Each line corresponds to the each transaction which has a set of items separated by single space. For example, first transaction has 3 items (foo, bar, do). If you don't need to care the sequential order of items, just sort items by dictionary order like: bar do foo bar foo foo he i you me she Format of results: item/freq. item/freq. ... item/freq. item/freq. ... .. Here is an example: bar/187 foo/113 do/170 bar/134 she/100 i/501 by/232 the/108 This result means: SEQUENTIAL PATTERN : FREQUENCY bar : 187 bar -> foo : 113 do : 170 do -> bar : 113 she : 100 i -> by -> the : 108 i -> by : 232 i : 501 Each line represents the longest sequential pattern whose frequency is larger than minsup (-m option). -M NUM1 and -L NUM2 options restrict the size of patterns extracted. By using -d option, the delimiter between item and freq can be changed. (default is "/") Note that any prefix of the longest pattern are also sequential pattern. However, by using -a option, you can obtain ALL patterns, all prefix of the longest pattern. Here is an example: 187 bar 113 bar foo 170 do 134 do bar 100 she 501 i 232 i by 108 i by the By using -w option, the list of transaction IDs where each pattern occurs can be obtained. Here is an example: * without -a option <pattern> <what>bar/187 foo/113</what> <where>54 141 218 264 295 472 768 839 900 931</where> </pattern> * with -a option <pattern> <freq>187</freq> <what>bar</what> <where>54 141 218 264 295 472 768 839 900 931 .... </where> </pattern> <pattern> <freq>113</freq> <what>bar foo</what> <where>54 141 218 264 295 472 768 839 900 931 .... </where> </pattern> Each result is surrounded by "<pattern>" tag. The pattern is in "<what>" tag, and transaction IDs are listed in "<where>" tag.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -