⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 mafia.cpp

📁 公认的比较有效的最大频繁集挖掘算法Mafia 算法源代码
💻 CPP
📖 第 1 页 / 共 3 页
字号:
/* Copyright (c) 2003, Cornell UniversityAll rights reserved. Redistribution and use in source and binary forms, with or withoutmodification, are permitted provided that the following conditions are met:   - Redistributions of source code must retain the above copyright notice,      this list of conditions and the following disclaimer.  - Redistributions in binary form must reproduce the above copyright      notice, this list of conditions and the following disclaimer in the      documentation and/or other materials provided with the distribution.  - Neither the name of Cornell University nor the names of its      contributors may be used to endorse or promote products derived from      this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THEIMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSEARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BELIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, ORCONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OFSUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESSINTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER INCONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OFTHE POSSIBILITY OF SUCH DAMAGE. *////////////////////////////////////////////////////////////////////////// Mafia.cpp/////////////////////////////////////////////////////////////////////////#define DEBUG#include <iostream>#include <fstream>#include <time.h>#include <stdio.h>#include <list>#include <vector>#include <algorithm>#include <string>#include <assert.h>#include <cmath>#include <map>#include "Bitmap.h"#include "TreeNode.h"#include "Transaction.h"#include "ItemsetOutput.h"using namespace std;/// @defgroup GlobalVariables Global Variables/// Global variables/** @{ */typedef vector<TreeNode *> NodeList;typedef vector<TreeNode *> BranchList;typedef vector<Bitmap *> BitmapList;typedef vector<BaseBitmap *> BaseBitmapList;typedef vector<int> ItemSet;typedef map<long, ItemSet *> HashTable;/// Simple class for storing subtree size estimatesclass SubtreeEstimate {public:    int Count;              ///< Count of actual subtrees counted    int Sum;                ///< Sum of subtree sizes    SubtreeEstimate () {        Count = 0;        Sum = 0;    }};/// Simple class for storing tail elements of each node of the treeclass TailElement {public:    int Count;              ///< Support of the 1-extension    int Item;               ///< Item-id for this1 -extension    TailElement () {        Count = 0;        Item = 0;    }    bool operator < (const TailElement& rhs) const {        return this->Count < rhs.Count;    };};/// @defgroup CommandLineParameters Command Line Parameters/Variables/// Commmand line parameters from user or inferred/** @{ */string method;               ///< either -mfi or -fci or -fichar* outFilename;           ///< filename for outputItemsetOutput *outFile;      ///< file for ouputbool outputMFI = false;      ///< true if MFI should be saved to a filebool MethodIsFI = false;     ///< true if the method is -fibool MethodIsFCI = false;    ///< true if the method is -fciint ItemCount;               ///< # of items in the fileint TransCount;              ///< # of transactions in the filedouble MSF;                  ///< user-defined min sup as percentageint MS;                      ///< min sup as a transaction countint VerScale = 1;            ///< Scaling factor for transactionsint HorScale = 1;            ///< Scaling factor for itemsbool GoFHUT = true;          ///< FHUT flagbool HUTMFI = true;          ///< HUTMFI flagbool PEPrune = true;         ///< PEPrune flag -- parent equivalent pruningbool Reorder = true;         ///< Reorder flag/** @} *//// @defgroup CounterVariables Counter Variables/// Counter variables for information gathering/** @{ */int CountFHUT = 0;           ///< # of times FHUT was successfulint CountNodes = 0;          ///< # of frequent nodes in the treeint CountCounts = 0;         ///< # of Counts or all nodes in the treeint CountAnds = 0;           ///< # of ANDs of normal bitmapsint CountSmallAnds = 0;      ///< # of compressed bitmap ANDsint CountPEPrunes = 0;         ///< # of PEPruningint CountCheckPosition = 0;  ///< # of CheckPosition callsint CountHUTMFI = 0;         ///< # of HUTMFI attemptsint CountHUTMFISuccess = 0;  ///< # of HUTMFI successesint CountRebuilds;           ///< # of Rebuilds/** @} *//// @defgroup ProgramVariables Program Parameters/Variables/// Useful program parameters/counters/** @{ */int maxtail = 0;int MFISize = 0;             ///< MFI size before pruningint MFIDepth = 0;            ///< The aggregated depth of the all MFI elementsint F1size = 0;              ///< # of frequent 1-itemsets after merging repeatsint FullF1size = 0;          ///< # of frequent 1-itemsetsint k = 50;                  ///< # of items checked for a MFI lookupint MAX_compID = 1;          ///< max compression IDint projectDepth = -1;       ///< depth of the bitmap you're projecting fromint EstimateSize;            ///< size of subtree estimation bufferint EstimateDiv = 5;         ///< bucket size by frequent tail lengthint maxItemsetSize = 0;      ///< max size of a frequent itemset/** @} *//// @defgroup DataVariables Data variables/// Complex data structure variables/** @{ */NodeList F1;                 ///< List of frequent 1-itemsetsBitmapList TransBuffy;       ///< Buffer of transaction bitmapsBaseBitmapList NameBuffy;    ///< Buffer of name bitmapsNodeList NodeBuffy;          ///< Buffer of tree nodessTreeNode *Root;              ///< The root (the nullset)TailElement *gTail;          ///< global tail pointerTailElement *TailBuffy;      ///< Common Buffer for tail elementsBitmap *NullTrans;           ///< A transaction bitmap filled with onesint *ItemMap;                ///< For renaming items after sorting by supportint *ReverseItemMap;         ///< For remembering the renaming...BaseBitmapList MFI;          ///< List of Maximally Frequent ItemsetsHashTable HT;                ///< Hash table of transaction supportsvector <int> SupportCountList;       ///< List that stores support countBaseBitmap* TempName;        ///< Temporary buffer for one name bitmapSubtreeEstimate* EstimateBuffy;      ///< Buffer of subtree estimatesint *MFIBySizes;             ///< Buffer for counting MFI by itemset sizeint *ItemsetBuffy;           ///< Buffer for writing itemsets to file output/** @} *//// @defgroup TimingVariables Timing Variables/// Variables for timing (and instrumenting the code)/** @{ */time_t total_start, total_finish;double total_time;time_t read_start, read_finish;double read_time;time_t algorithm_start, algorithm_finish;double algorithm_time;time_t print_start, print_finish;double print_time;/** @} *//** @} *//********************************************************************* Reading the data in and building the item bitmaps*********************************************************************//// @addtogroup InputOutput Input/Output Functions/// Reading the data in and building the item bitmaps./// Outputting the frequent itemsets./** @{ *///////////////////////////////////////////////////////////////////////// Insert pointer to node in order of increasing support////// @param newNode           item node to add to F1/////////////////////////////////////////////////////////////////////void AddToF1(TreeNode *newNode) {    if (F1.empty())        F1.push_back(newNode);    else {        // use insertion sort for increasing support ordering        NodeList::iterator noli = F1.begin();        while (noli != F1.end()) {            if ((*noli)->Trans->_count >= newNode->Trans->_count) {                F1.insert(noli, newNode);                return ;            }            noli++;        }        // Add to end of F1 list        F1.push_back(newNode);    }}//////////////////////////////////////////////////////////////////////// Create bitmaps filled randomly with probability p////// @param P - probability of a bit p being set/////////////////////////////////////////////////////////////////////void F1UsingProb(double P) {    int blah = 0;    //cout << "Creating bitmap with prob p=" << P << endl;    // Create frequent 1-itemsets with probability p    for (int i = 0; i < ItemCount / HorScale; i++) {        // Create random transaction bitmap        Bitmap *trans = new Bitmap(TransCount);        trans->FillRand(P);        // Add frequent items to list        if (trans->Count(blah) >= MS) {            TreeNode *node = new TreeNode(NULL, trans, 1, -1, i, -1, 0);            for (int r = 0; r < HorScale; r++)                AddToF1(node);        } else            delete trans;    }}//////////////////////////////////////////////////////////////////////// Read transaction data from file and build item bitmaps.///     If the data is in ascii form:///     [itemid_1] [itemid_2] ... [itemid_n]///     If the data is in binary form:///     [custid] [transid] [number of items] [itemid_1] [itemid_2] ... [itemid_n]////// @param filename          name of file to be read in/// @param isAsciiFile       true for ASCII input, false for binary/////////////////////////////////////////////////////////////////////void F1FromFile(char *filename, bool isAsciiFile) {    TransCount = 0;    int MAXitemID = -1;    int *Counters = new int[MAX_NUM_ITEMS];        //to count the support of items    int *F1IndexList = new int[MAX_NUM_ITEMS];     // the id's of F1 items    int *InvF1IndexList = new int[MAX_NUM_ITEMS];    int itemIndex = 0;    //Initialize counters    for (int ct = 0; ct < MAX_NUM_ITEMS; ++ct) {        Counters[ct] = 0;        F1IndexList[ct] = -1;        InvF1IndexList[ct] = -1;    }    time(&read_start);    BitmapList Trans;    int *itemlist = new int[MAX_NUM_ITEMS];    InputData *inData = new InputData(filename, itemlist, isAsciiFile);    if (!inData->isOpen()) {        cerr << "Input file not found!" << endl;        exit(1);    }    Transaction *newTransaction = inData->getNextTransaction();    while (newTransaction != NULL) {        for (itemIndex = 0; itemIndex < newTransaction->length; itemIndex++) {            // ensure that there are not too many items            if (newTransaction->itemlist[itemIndex] >= MAX_NUM_ITEMS) {                cerr << "Read item_id=" << newTransaction->itemlist[itemIndex]                << " which is more than the max item_id allowed (" << MAX_NUM_ITEMS << ")";                exit(1);            }            if (newTransaction->itemlist[itemIndex] > MAXitemID)                MAXitemID = newTransaction->itemlist[itemIndex];            Counters[newTransaction->itemlist[itemIndex]]++;        }        TransCount++;        delete newTransaction;        newTransaction = inData->getNextTransaction();    }    delete newTransaction;    delete inData;    MS = (int)ceil(MSF * (double)TransCount);    //MSF = MS/(double)TransCount;    ItemCount = MAXitemID + 1;    int F1items = 0;    // build the normal bitmaps -- Preallocated memory for the bitmaps    for (itemIndex = 0; itemIndex <= MAXitemID; itemIndex++) {        if (Counters[itemIndex] >= MS) {            F1IndexList[F1items++] = itemIndex;            InvF1IndexList[itemIndex] = F1items - 1;            Bitmap *trans = new Bitmap(TransCount);            trans->_count = Counters[itemIndex];            Trans.push_back(trans);        }    }    int transIndex = 0;    InputData *inData2 = new InputData(filename, itemlist, isAsciiFile);    newTransaction = inData2->getNextTransaction();    while (newTransaction != NULL) {        for (int itemIndex = 0; itemIndex < newTransaction->length; itemIndex++) {            if (InvF1IndexList[newTransaction->itemlist[itemIndex]] != -1) {                Trans[InvF1IndexList[newTransaction->itemlist[itemIndex]]]->FillEmptyPosition(transIndex);            }        }        transIndex++;        delete newTransaction;        newTransaction = inData2->getNextTransaction();    }    delete newTransaction;    delete inData2;    delete [] itemlist;    time(&read_finish);    read_time = difftime(read_finish, read_start);    printf("Reading input time:    %f seconds.\n", read_time);    // Create F1    BitmapList::iterator bli = Trans.begin();    itemIndex = 0;    while (bli != Trans.end()) {        TreeNode *node = new TreeNode(                             NULL,                             (*bli),                             1,                             -1,                             F1IndexList[itemIndex],                             -1,                             0);        AddToF1(node);        bli++;        itemIndex++;    }    delete [] Counters;    delete [] F1IndexList;    delete [] InvF1IndexList;}//////////////////////////////////////////////////////////////////////// To print the MFI data out to an ASCII file///     with each entry having the format:///     [list of items in MFI entry...] [(support)]/////////////////////////////////////////////////////////////////////void PrintMFI() {    // open output file    outFile = new ItemsetOutput(outFilename);    if (!outFile->isOpen()) {        cerr << "Output file not open!" << endl;        exit(1);    }    if (FullF1size != 0) {        // translate bitmap to list of INTs        int* ITEMS = new int[FullF1size];        for (int i = 0; i < MFISize; i++) {            int j = 0;            for (int cc = 0; cc < FullF1size; cc++) {                if (MFI[i]->CheckPosition(cc, CountCheckPosition) > 0) {                    ITEMS[j] = ItemMap[cc];                    j++;                }            }            outFile->printSet(MFI[i]->_count, ITEMS, SupportCountList[i]);        }        delete [] ITEMS;    } else {        outFile->printSet(0, NULL, TransCount);    }    delete outFile;}/** @} *//********************************************************************* Algorithmic components*********************************************************************//// @defgroup AlgorithmicComponents Algorithmic Component Functions/// Algorithm components (HUTMFI, PEP, etc.)/** @{ *///////////////////////////////////////////////////////////////////////// Check for an existing superset of name in the MFI////// @param location          The node we're at.  Use this to examine only///                          the relevant portion of the MFI/// @return True - if superset found.///         False - if no superset./////////////////////////////////////////////////////////////////////bool LMFISuperSet(TreeNode* location) {    return (location->rEnd > location->rBegin);}//////////////////////////////////////////////////////////////////////// Output itemset (don't need to save bitmaps for FI)////// @param C                 the current node/////////////////////////////////////////////////////////////////////void AddToFI(TreeNode *C) {    int itemsetIndex = 0;    for (int cc = 0; cc < FullF1size; cc++) {        if (C->Name->CheckPosition(cc, CountCheckPosition) > 0) {            ItemsetBuffy[itemsetIndex] = ItemMap[cc];            itemsetIndex++;        }    }    MFIBySizes[C->Name->_count]++;    if (C->Name->_count > maxItemsetSize)        maxItemsetSize = C->Name->_count;    if (outputMFI)        outFile->printSet(C->Name->_count, ItemsetBuffy, C->Trans->_count);    // Update stat variables

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -