⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 myapriori.java

📁 数据挖掘中并行关联规则中经典Apriori算法的Java源代码实现。
💻 JAVA
字号:
package apriori;

/**
* formatted with JxBeauty (c) johann.langhofer@nextra.at<br>
*/


import java.util.*;
import java.sql.*;
import oracle.jdbc.pool.*;
/**


* A program to find association rules with the apriori algorithm (Agrawal et al. 1993).<br>
* Other than the standard apriori algorithm, this program enable to find<br>
* apriori all relation.<br>
*<br>
* The program able to run on any SQL92 enable database which have the schema<br>
* like this:<br>
* <pre>
* Table "web_record"
* Attribute | Type | Modifier
* -------------+----------------------+----------
* customer_id | character varying(5) | not null
* view_seq | integer | not null
* vroot_id | character varying(4) | not null
* </pre>
*<br>
* Which are the customerID, transactionSequence and transactionID.<br>
* You should able to modify all data that can be run apriori all algorithm with<br>
* this data structure.<br>
*<br>
* Here is an data parser example to parse the Microsoft Anonymous Web Data<br>
* http://kdd.ics.uci.edu/databases/msweb/msweb.html<br>
* <br>
* Here is the example running command:<br>
* $JAVA_HOME/bin/java -Xmx128m Apriori -s 1 -c 5 -seq<br>
* $JAVA_HOME/bin/java -Xmx128m Apriori -s 2 -c 8<br>
* And the result will display on the screen<br>
*
* The command line option:<br>
* -s: support value, 1 mean only support larger than 1% will display<br>
* -c: confident value, 5 mean only confident larger than 5% will display<br>
* -seq: the program will run apriori all algorithm with this value<br>
*/
public class myApriori {
private String strDriver = "com.microsoft.jdbc.sqlserver.SQLServerDriver";
private String url = "jdbc:microsoft:sqlserver://127.0.0.1:1433;DatabaseName=master;SelectMethod=cursor";
private String tableName = "web_record";
private String countField = "counts";
private String customerIdField = "vroot_id";
private String transactionIdField = "customer_id";
private String transactionSequenceField = "view_seq";

private double support;
private double confident;
private boolean aprioriAll;
private Connection conn;
private int totalCustomer = 0;
private HashMap supportCount; // Held the itemSet and the support of that itemSet

private ResultSet execSQL (String sql) throws Exception {
PreparedStatement stmt = conn.prepareStatement(sql);
return stmt.executeQuery();
}

private void addRule (List values, List supports, List confidences, List allItemSets,
List thisSubSet, List remainSet, float wholeCount, float baseCount) {
if (aprioriAll) {
List checkList = new ArrayList((List)thisSubSet);
checkList.addAll(remainSet);
if (allItemSets.contains(checkList)) {
values.add(remainSet);
supports.add(new Float((float)wholeCount/(float)totalCustomer));
confidences.add(new Float((float)wholeCount/(float)baseCount));
}
}
else {
values.add(remainSet);
supports.add(new Float((float)wholeCount/(float)totalCustomer));
confidences.add(new Float((float)wholeCount/(float)baseCount));
}
}

/**
* The default construstor, specific the support, confident value, and run
* apriori all algorithm or not
* @param double support
* @param double confident
* @param boolean aprioriAll
*/
public myApriori (double support, double confident, boolean aprioriAll) throws Exception
{
/* String connectURI = "jdbc:oracle:thin:@192.168.0.224:1521:orahzoa";
String userName = "apriori";
String passWord = "apriori";
Class.forName("oracle.jdbc.driver.OracleDriver").newInstance();
conn = DriverManager.getConnection(connectURI, userName, passWord);
*/
String userName = "sa";
String passWord = "sa";
Class.forName(strDriver).newInstance();
conn = DriverManager.getConnection(url, userName, passWord);
// Class.forName(strDriver).newInstance();
// conn = DriverManager.getConnection(url);
supportCount = new HashMap();
this.support = support;
this.confident = confident;
this.aprioriAll = aprioriAll;
}

/**
* The method to parse the string array argumnents
* @param String[] args
* @return
* @exception Exception
*/
public static myApriori init (String[] args) throws Exception
{
double confidence = 0.0;
double support = 0.0;
boolean aprioriAll = false;
if (args.length > 0) {
for (int i = 0; i < args.length; i++) {
if ("-s".equals(args[i])) {
int tempSupport = Integer.parseInt(args[i + 1]);
if (tempSupport > 0) {
support = (double)(tempSupport)/100;
}
i++;
}
if ("-c".equals(args[i])) {
int tempConfidence = Integer.parseInt(args[i + 1]);
if (tempConfidence > 0) {
confidence = (double)(tempConfidence)/100;
}
i++;
}
if ("-seq".equals(args[i])) {
aprioriAll = true;
}
}
}
return new myApriori(support, confidence, aprioriAll);
}

/**
* Find the total no. of customer and use this to calculate the support for
* the rules belong to that customer.
* @return
* @exception Exception
*/
public int getTotalCustomer () throws Exception {
if (totalCustomer == 0) {
String sql = "select count(distinct("+customerIdField+")) from "+tableName+"";
ResultSet rs = execSQL(sql);
if (rs.next())
totalCustomer = rs.getInt(1);
}
return totalCustomer;
}

/**
* Construct the SQL<br>
* <br>
* 1) select a0.transactionId, count(a0.customerId) from tableName a0 group by a0.transactionId;<br>
* 2) select a0.transactionId, a1.transactionId, count(a0.customerId) from tableName a0, tableName a1 where<br>
* a0.transactionSequence < a1.transactionSequence and a0.customerId=a1.customerId and <br>
* a0.transactionId in ([previous appear item]) and a0.transactionId in ([previous appear item])<br>
* group by a0.transactionId, a1.transactionId;<br>
* .....<br>
* .....<br>
* 3) select a0.transactionId, ...., aN.transactionId, count(a0.customerId) from tableName a0, ......, tableName aN where <br>
* a0.transactionSequence < a1.transactionSequence and .... and a(N-1).transactionSequence < aN.transactionSequence and<br>
* a0.customerId=a1.customerId and .... and a(N-1).customerId=aN.customerId<br>
* a0.transactionId in ([previous appear item]) and .... and aN.transactionId in ([previous appear item])<br>
* group by a0.transactionId, ...., aN.transactionId;<br>
*<br>
* If we not do the ApriorAll algorithm, we just need to remove the statement:<br>
* a0.transactionSequence < a1.transactionSequence and .... and a(N-1).transactionSequence < aN.transactionSequence<br>
* and add the following statement:<br>
* a0.transactionId < a1.transactionId and .... and a(N-1).transactionId < aN.transactionId<br>
* because we can remove the duplication entries<br>
*
* @param noOfItem the SQL of that no. of itemsets
* @param previous the previous items find
* @return The SQL
*/
public String getSQL (int noOfItem, List previous) {
String transactionIDs = " a0."+transactionIdField+"";
String tables = " "+tableName+" a0";
String aprioriAllFreq = "";
String skipDupTransaction = "";
String customerIDs = "";
String previousItems = "";
String searchItems = " ";

if (previous != null) {
StringBuffer sb = new StringBuffer(100);
// Extrace the items from the previous itemsets gotten
for (int i = 0; i < previous.size(); i++) {
ArrayList itemSet = (ArrayList)previous.get(i);
for (int j = 0; j < itemSet.size(); j++) {
String item = Integer.toString(((Integer)itemSet.get(j)).intValue());
if ( sb.toString().indexOf(item) == -1 )
sb.append(",").append(item);
}
}
previousItems = sb.substring(1); //Don't need the first ','
}
//Contruct the SQL
if ( previousItems.length() > 0 ) {
searchItems += " and a0."+transactionIdField+" in (" + previousItems + ") ";
}
for (int i = 1; i < noOfItem; i++) {
transactionIDs += ", a" + i + "."+transactionIdField+"";
tables += ", "+tableName+" a" + i;
customerIDs += " and a" + (i - 1) + "."+customerIdField+" = a" + i + "."+customerIdField+"";

if ( previousItems.length() > 0 )
searchItems += " and a" + i + "."+transactionIdField+" in (" + previousItems + ") ";

if (aprioriAll)
aprioriAllFreq += " and a" + (i - 1) + "."+transactionSequenceField+" < a" + i + "."+transactionSequenceField+"";
else
skipDupTransaction += " and a" + (i - 1) + "."+transactionIdField+" < a" + i + "."+transactionIdField+"";
}
// Special handling for SQL of first itemset
if (aprioriAllFreq.equals("") && customerIDs.equals("")) {
return "select " + transactionIDs + ", count(a0."+customerIdField+") as counts from " +
tables + " group by " + transactionIDs;
}
else {
customerIDs = customerIDs.substring(4);
return "select " + transactionIDs + ", count(a0."+customerIdField+") as counts from " +
tables + " where " + customerIDs + skipDupTransaction + aprioriAllFreq
+ searchItems + " group by " + transactionIDs;
}
}

/**
* Find that N th itemsets
* @param no_i the N th itemset to find
* @param pastItemsSet The previous itemset to limit the search
* @return The list contain that N th itemsets
* @exception Exception
*/
public List findNthItemSet (int no_i, List pastItemsSet) throws Exception {
ArrayList itemSet = new ArrayList();
String sql = getSQL(no_i, pastItemsSet);
// System.out.println(sql);
// sql="select user from dual";
PreparedStatement stmt = conn.prepareStatement(sql);
ResultSet rs = stmt.executeQuery();
while (rs.next()) {
// Get distinct transactionID combination to run
// set perpare statement also
ArrayList items = new ArrayList();
for (int i = 0; i < no_i; i++) {
Integer transactionID = new Integer(rs.getInt(i + 1));
items.add(transactionID);
}
int count = rs.getInt(countField);
if (count >= getTotalCustomer()*support) {
itemSet.add(items);
supportCount.put(items, new Integer(count));
}
}
rs.close();
stmt.close();
return itemSet;
}

/**
* Find all itemSets avaliable for the given support
* @return A List array.<br>
* First item is the list contains lists of itemsSets according<br>
* the the Nth no. e.g.: { {{1,2}, {2,3}}, {{1,2,3},{2,3,4}} }<br>
* Second item is the list contains all itemsSets without<br>
* separation. e.g.: { {1,2}, {2,3}, {1,2,3},{2,3,4} }<br>
* @exception Exception
*/
public List[] findItemSets () throws Exception {
ArrayList allNthItemSet = new ArrayList();
ArrayList allItemSets = new ArrayList();

// Find all frequency itemset from the given support
List pastItemsSet = null;
int i = 0;
while(true) {
i++;
List nthItemSet = findNthItemSet(i, pastItemsSet);
if (nthItemSet.size() > 0) {
allNthItemSet.add(nthItemSet);
allItemSets.addAll(nthItemSet);
}
else
break; //break when no more items find
pastItemsSet = nthItemSet;
}
List[] returnList = {
allNthItemSet, allItemSets
};
return returnList;
}

/**
* Find all rules according to the given support and confident
* @param allNthItemSet the list contains lists of itemsSets according<br>
* to the Nth no. e.g.: { {{1,2}, {2,3}}, {{1,2,3},{2,3,4}} }<br>
*
* @param allItemSets the list contains all itemsSets without<br>
* separation. e.g.: { {1,2}, {2,3}, {1,2,3},{2,3,4} }<br>
*
* @return A Map array.<br>
* First item is the map of rules of that key.<br>
* Second item is the map of supports of that key.<br>
* Third item is the map of confidence values of that key.<br>
*/
public Map[] findRules (List allNthItemSet, List allItemSets) {
HashMap rules = new HashMap();
HashMap rules_support = new HashMap();
HashMap rules_confidence = new HashMap();
// Find all the rules from support
// For all levels of itemsets
for (int j = 0; j < allNthItemSet.size(); j++) {
List nthItemSet = (List)allNthItemSet.get(j);
// For all items of that level
for (int k = 0; k < nthItemSet.size(); k++) {
List thisItemSet = new ArrayList((List)nthItemSet.get(k));
// Find All subset from all items
for (int l = 0; l < allItemSets.size(); l++) {
List thisSubSet = new ArrayList((List)allItemSets.get(l));
List remainSet = new ArrayList((List)thisItemSet);
remainSet.removeAll(thisSubSet);
// Both SubSet and remaining set must contain in that itemset
if (thisItemSet.containsAll(thisSubSet) && thisItemSet.containsAll(remainSet)
&& remainSet.size() > 0) {
int wholeCount = ((Integer)supportCount.get(thisItemSet)).intValue();
int baseCount = ((Integer)supportCount.get(thisSubSet)).intValue();
// Check if the confident enough
if ((float)wholeCount/(float)baseCount >= confident) {
// Use list to hold all values because the is more than one rules of that key
ArrayList values = (ArrayList)rules.get(thisSubSet);
ArrayList supports = (ArrayList)rules_support.get(thisSubSet);
ArrayList confidences = (ArrayList)rules_confidence.get(thisSubSet);
// Check if that key item found previously
if (values != null) {
// If that value item found previously, no need to re-add
if (!values.contains(remainSet)) {
addRule(values, supports, confidences,
allItemSets, thisSubSet, remainSet,
wholeCount, baseCount);
}
}
else {
// Create new list to hold the values, support and confidences if new key item
values = new ArrayList();
supports = new ArrayList();
confidences = new ArrayList();
addRule(values, supports, confidences, allItemSets,
thisSubSet, remainSet, wholeCount,
baseCount);
rules.put(thisSubSet, values);
rules_support.put(thisSubSet, supports);
rules_confidence.put(thisSubSet, confidences);
}
}
}
}
}
}
Map[] returnMap = {
rules, rules_support, rules_confidence
};
return returnMap;
}

/**
* The main program to run and print out the rules
* @param args
* @exception Exception
*/
public static void main (String[] args) throws Exception {
long start = System.currentTimeMillis();
myApriori dm = init(args);

List[] itemSets = dm.findItemSets();
List allNthItemSet = itemSets[0];
List allItemSets = itemSets[1];

Map[] rulesContents = dm.findRules(allNthItemSet, allItemSets);
Map rules = rulesContents[0];
Map rules_support = rulesContents[1];
Map rules_confidence = rulesContents[2];

//print out rules
Iterator it = rules.keySet().iterator();
System.out.println("Rules:\t\t\tSupport:\tConfident:");
System.out.println("------\t\t\t--------\t----------");
while (it.hasNext()) {
Object key = it.next();
List valueList = (List)rules.get(key);
List valueList_support = (List)rules_support.get(key);
List valueList_confidence = (List)rules_confidence.get(key);
for (int x = 0; x < valueList.size(); x++) {
System.out.println(
// rules gotten
key + "->" + valueList.get(x) + "\t\t" +
// support of that rules
((Float)(valueList_support.get(x))).floatValue()*100 + "%\t" +
// confident of that rules
((Float)(valueList_confidence.get(x))).floatValue()*100 + "%");
}
}
System.out.println("time needed: " + (System.currentTimeMillis() -
start));
}
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -